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ABSTRACT 


The basic design problems of a document retrieval system 
are reviewed. A Simple optimum iterative feedback system is 
proposed that makes use of two interrelated sets of parameters 
supplied respectively by the user and the system. The set of 
user parameters is designed to reflect the user's own point of 
view on the search subject matter; while the set of system 
parameters is designed to reveal some data base characteristics. 
A set of index terms and their corresponding significance values 
are abstracted from the Computing Science data hbase by an 
automatic indexing algorithm based on some Statistical 
association measures. In order to eliminate storage shortage 
problems created by large matrices such as the document-tern 
matrix, a least-storage scheme and a subscript-matching 
algorithm are developed to assist manipulations of these large 
matrices. Some relevance judgment criteria are defined and a 
relevance measure is derived. The optimum iterative feedback 
algorithm is first described for search on document title terms 
only; and is then generalized to include search on other 
relevant items such as author names and so on. Finally, the 


convergence of the algorithm is verified for both cases. 
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CHAPTER I 


The design of an automatic information retrieval system 
includes basically the organization of data; the manipulation of 
information; the formulation of search logic; the definition of 
query language; the implementation of search strategy; the 
presentation of output; the evaluation of system performance; 
and, finally, the optimization of system effectiveness. Since 
information retrieval systems are user-oriented in nature, the 
prime objective of system design is to achieve the retrieval of 
all relevant, and only relevant, information in response to any 
user's query. An ideal system can thus be regarded as one that 
retrieves from a given data base all the relevant information 
while at the same time rejecting all information that is 
irrelevant to any given search request. However, such an ideal 


system never occurs in practice. 


Indeed, there are numerous factors that govern retrieval 
performance. Human errors and system incompatibilities are the 
Major sources of discrepancy. Human errors may be further 
subdivided into designer errors and user errors. Examples of 
designer errors are inaccurate representation of information, 
such as through spelling errors; poor search strategy; and 
ambiguities in formal query language definition. Common user 


errors arise from poor request formulation and poor concepts of 
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system capabilities. Examples of system incompatibilities are 
bugs in programming; poor decision-making such as in determining 
a threshold value at a certain stage of the search process; and 
inconsistency between search logic and search functions 
resulting in misinterpretation of query. Fortunately, these 
deficiencies are normally controllable by means of careful 
planning and Management. In many systems, some form of 
optimization process may be employed in order to reduce the 
noise in the search output so as to ensure satisfactory 


responses to the search reguests. 


This leads to the remaining, and yet the most controversial 
problem, which is the judgment of relevance of the final search 
output. Obviously, relevance assessment is totally subjective to 
the individual's viewpoint. The user, for instance, is primarily 
interested in obtaining information that satisfies his 
particular need; and not in whether the retrieved information 
does, in fact, match his search request. The system designer, on 
the other hand, has to make sure that the retrieved information 
matches the logic of the query. Thus, relevance judgment in this 
context is rather unreliable. An alternative is to employ a few 
judges or groups of judges at one time. However, experimental 
results indicate that under different conditions even the same 
judge may give different relevance judgment to the same query 
and the same corresponding set of output. It is not until some 
specific guidelines are followed that a substantial gain of 
stability in relevance judgment is observed {1,2}. Consequently, 


it may be hypothesized that a set of well-defined relevance 
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judgment criteria is absolutely essential for any retrieval 
system whose performance depends heavily on some relevance 
measures. Thence, the optimization of search output can be 


carried out without ambiguity. 


1.2 Optimization Methods 


Most retrieval systems that optimize search output make use 
of iterative search techniques. A fairly straightforward 
semiautomatic approach is to present the user with the search 
output together with a set of machine-generated index terms 
derived on the basis of their association within the system. The 
user then makes a relevance judgment according to some 
predefined criteria. If the search output is not satisfactory, 
he may reformulate his initial query by selecting more 
Significant terms from the index list and resubmit a modified 
query for another search run. It is expected that the revised 
query will lead to better search results because it reflects the 
system parameters as well as the user's own point of view. He 
may repeat the same precedures until a fully satisfactory output 
is obtained [3]. However, this method of search optimization is 
rather inefficient in that it is too time-consuming and costiy. 
In some cases, the users may soon become very frustrated in the 


process of waiting and repeating the same routine over and over 


again. 
A more sophisticated approach that has_ been widely 
experimented with is the use of real-time, man-machine 


interaction [4 to 6]. The basic principles behind this method 
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are exactly the same as in the one just described. The system 
makes a finite number of searches interspersed with user 
‘reformulation of search request with the aid of additional index 
terms displayed on the terminal. The user may find terminal use 
to be amusing for the first few times. After a while he will 
probably become bored at having to wait for response and make 
changes. As a result of all the inconvenience he may sometimes 
lose track of his original information. Furthermore, as most 
computer systems give terminal jobs the highest priority for 
execution, the cost for on-line iterative searches is far more 
expensive than for other searches. Finally, the most serious 
drawback of on-line searches is that terminals are then often 
unnecessarily denied for use for other purposes. From these 
observations, it can be concluded that a more economical and 


effective system is to be preferred. 


In a recent study by Heaps and Ko {7 to 9] a method known 
as the "automatic adaptive processing of questions" is examined. 
Four criteria are derived and tested separately. The users need 
only specify an estimate of relevance to their search requests. 
The system then modifies the requests automatically according to 
one of the four criteria and obtains an optimum set of weights 
for internal use. Search results show that the final output may 
contain some relevant information that the requesters neglected 
to mention in their queries. The non-iterative and completely 
automatic nature of this model has successfully eliminated the 
painstaking and time-consuming efforts normally required by the 


users of other systems. However, the system is not without 
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shortcomings. The main one is the requirement of more computer 
time because large matrices, and a lot of computations, are 


involved. 


1.3 The Model 


An alternative approach can be represented by a simple 
model as shown in Fig. 1.1. The model may be called an automatic 
optimum iterative feedback document retrieval system because it 
makes use of automatic iterative feedback control to optimize 
search outputs. The method consists of three phases, namely, the 
pre-search phase, the search-phase, and the post-Search phase. 
In the pre-search phase, the user formulates his search requests 
with the aid of a set of index terms and their significance 
values automatically abstracted from the data base according to 
some attribute measures based on statistical associations. He 
then submits his requests coded in conformity with the query 
language described in Backus Normal Form. In this manner, 
ambiguous requests can readily be detected and then rejected. As 
a result, the acceptable requests will be assumed to contain all 
relevant information needed for search purposes as well as 


search optimization purposes. 
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Fig. 1.1 The Model 


The search phase is responsible for deciding the degree of 
relevance that a document has in relation to the requests. A 
document is classed as provisionally relevant only if its 
relevance value is greater than, or equal to, a pre-determined 
cutoff value. Members of the set of provisionally relevant 
documents, if not null, will be arranged in descending order of 
relevance. The post-search phase then examines this set to 
determine if some pre-defined relevance criteria are met. 
Whenever it is not met, control is passed back to the search 
phase after modifying the initial (or previously modified) 
search requests. Such examination and modification is repeated a 


finite number of times until the relevance criteria are met. 
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CHAPTER IT 


AN AUTOMATIC INDEXING ALGORTTHM 


The data base used is the CSDATA tape prepared for 
experimental use for Course CS560 within the Department of 
Computing Science at the University of Alberta. It is made up of 
approximately 7,000 journal articles taken from various current 
computing science journals, Each journal name is represented by 
an ASTM (American Society for Testing and Materials) coden. A 
list of ASTM codens and journal names used in CSDATA is included 


in Appendix A. 


To facilitate editing and updating of data, the tape is 
blocked into logical records of 80 bytes according to the format 
as shown in Fig. 2.1. All author names (excluding initials) and 
title words are truncated to five letters of each. The former is 
followed by a slash (/), and the latter by a blank. Words of 
less than five letters are left-justified and followed by the 
appropriate number of blanks. Hyphenated title words are coded 
aS separate words, while all insignificant words such as 
prepositions are eliminated. In the case when more than one 
logical record is required, the letter 'C' is specified in 
column 80 to atest e continuation of data to columns 14-79 of 
the next logical record. The data in columns 1-13 are repeated 
for the purpose of article identification. Finally, the data 


base is sorted alphanumerically in ascending order on columns i- 
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13, and with a secondary sort in descending order on column 80. 
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Fig. 2.1 Format_o 
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The following is a typical example of a logical record 


belong to the data base; 


JACOA66130317KRISH/WOOD /TIME SHARE OPERA INTER SERVI TIMES EXPON 


which is the code for an article appearing in Vol. 13, 1966, of 
the Journal of the Association for Computing Machinery, starting 
on page 317 written by B. Krishnamoorthi and R. C. Wood entitled 
"Time-Shared Operations with Both Interarrival and Service Times 
Exponential". It has been claimed by Heaps [10] that the effect 
of truncation actually outweighs most of the disadvantages. Most 
important, the use of truncation allows a save in storage and 


search time. 


Perhaps it is worthwhile to note at this point that the 


effectiveness of a document retrieval system depends very much 
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on the selection of the data base. The suitability of CSDATA to 
fit the present system has been carefully studied. First of all, 
CSDATA is homogeneous in the sense that all its keywords are 
related to the field of computing science. Therefore, the 
occurrence of homonyms is very unlikely to happen. Since the 
system deals with semantic information, the exclusion of 
homonyms considerably simplifies the search process. Secondly, 
the terminologies of CSDATA are reasonably stable and not too 
specialized. Hence, the amount of keywords that may he 
ambiguously interpretated due to truncation is kept to a 
minimum. At the same time, no special treatment is required for 
any subset of keywords. Lastly, the collection is believed to he 
large enough to allow abstraction of Significant data 


characteristics. 


2.2 The Automatic Indexing Algorithm 


As the system places greater emphasis on the output than 
the input, it is not necessary to develop a thesaurus. Instead, 
effort is devoted to generation of a comprehensive list of index 
terms. This is achieved by an automatic indexing algorithm based 
on the statistical association of terms within the document 
collection. It has been shown that terms which co-occur in 
document titles more frequently than the average are 
semantically, as well as statistically, related {11 to 17]. The 
set of index terms should satisfy the general goal of automatic 
indexing which is to provide a compact representation of the 


information content of the given data base. 


let -ATAce?2: Yo yeileissine 9 


' iPooen 


. wehe seh old 
ghia 20 deati Peihnte és bde | eae aad astaye. 
exe eitoeves ati fis sad? Shier eis of avosuspomod 
ety \Sxbtesed’ .oonetoe BReeggEoD To alert ena 


add Sonic .cotied oF UsArian YIey 2i ahysonod to" 


to ddiegi-xs off. \aotsemz0THt 1iguvesen dite ef 
\ylbaonés .2aa0034 Horses aby 434 shigere ae 2 OY 


fie afdste ¢idedaoesed $18 ATAGED Yo aazouks 


HJ Ou 
24 viw tats  2hadwyes fo Jupoms ont , 0098 A 

+ 2964 eb deksacapst? oF sb hese raigmosak Y¥:! 
(ot paukypes ak teeter? Leduaye Of ,2el! anne odd 34 


od hewellod ei gofsaeiloo eds .ytsead »ebackwyed Yo soedue 


* om a . ont =, 
ateh saenitlavsre to Hivos zteads wolls of dpvoae 3 
¢ 


watt sop. A pineal ae 
i 


(ent matgo ade ny eiesidgas tersgetp eeoely Beas nat - 
ipso a 2 ein i A gotevayn of yysteroen You ak af 93 
tebui Yo. teat-eViewoder(nod 5 Yo apitatensp OF Sesome 
Seasd néfitowls pagxalhal vigewotts ab yd bevetdon ate 
Syesijeh wit abigivaaget 36 nobsslocacs oleae : 
6h evern2 tod atop’ tats wos eed aed | 
e385 “epAasy 302 nats  Yidiiefpeti.. sroz 
oat [Th v2 fT) batt ee wth : 35 Liaw @6'« 
 Dédeebsus » fet sbaidaisoh 2 fiiveds auaed x 
oda 36) so br esaeneaes: santo mien sil er 


pate at eee 


10 


To determine the importance of a term by means of 
statistical association techniques, it is necessary to make use 
of the co-occurrence frequency data, together with the total 
frequency data, in order to define some statistical association 
measures that reflect the degree of relatedness of the term with 
others. Before applying any association measure, it is desirable 
to exclude very high frequency terms (common terms) as well as 
very low frequency terms from the data base since such terms are 


semantically insignificant. Usually, a stop-list will serve this 


purpose. 


Statistical association measures are generally expressed in 


terms of one or mcre of the following elements: 


fs = the frequency of occurrence of term,, 
Se the frequency of co-occurrence of SI and ch a and 
ae the total number of documents in the data base. 
Let Wig = the indicator of the presence of term, in document, . 
1 ae term, is in document, , 
- 0 otherwise. 


then, by definition, 


(2. 1) 


n 
and, fxs = y (2. 2) 
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Suppose m is the total number of distinct terms in the data 
base, the matrix W = Oe ae is often called the document-term 
matrix. Hence, f; is equal to the j-th column sum of W, and ti 
is equal to the dot product of the transpose of the i-th column 


ardethe j-th=column of W. Also, by definition, £..“= £., for all 
jae 


Ly 


There are numerous statistical association measures that 
have been introduced into the literature of information storage 
and retrieval. A list of some of the most frequently used 
measures is given in Appendix B. Not surprisingly, each measure 
has been found to have itS own merits and demerits. In fact, 
some measures are Similar to one another in that they give 
equivalent rankings to the same set of terms [12]. Now, suppose 


that, according to scme appropriate measure, 


c.. = the extent to which term. is associated with tern. 
ca ; 1 d 
in the data base. 


aoeematrix C= -(c_.) is often called a term-term association 
ij mxm 
matrix. In the experiments to follow, three of the most common 


association measures are tested. They are 
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Tt is convenient to define ff. = 
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otf the above measures. ie eas 


measure gives rise to a symmetric 


suchas thaty ¢..,=. c: 


4 ji Porkalivis. 


Now, the extent to which 


document may be considered as the 
associated with the terms in the 


term may bear a certain degree of 
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okvious that each association 


term-term association matrix 


a term is associated with a 


extent to which the term is 


document title. As a result, a 


relatedness to a document even 


though it does not appear in the document title. This peculiar 
feature can be interpreted as arising from the fact that 
different terms can have Similar contexts and hence may be used 
as substitutes for others. In practice, these relations are used 
to aid indexing of new documents [17]. Let 
g.. = the extent to which term. is associated with 
+d document, . J 
= weighing factor for tern, in relation to 
document, . J 
In accordance with the above view we may define 
m 
és (2. 4) 
k=] 
m m 
= we e 
Note that y Wey y Fie 
k=] k=1 
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Now, let W = (Wis Woe sees ) where ae 
i = 1, 2, «.-. , n represents the i-th row of W, and (.)t denotes 
Piomteranspose Of (3). Similarly, let c = (Cie Crime seeg S ) where 
ou phew ly 27 es. >, PM Lepresents the j-th “column “ef CC.” Then, 


Git. = BFE ee Fe sl Py Nl J as a (2.5) 


where |{|.}!| 1S the Euclidean norm. The matrix G = (g,.) is 
i ePsadle di) 
called the weighted document-term matrix. In matrix notation, 


equation (2.5) becomes: 


Ci MWC, (2.6) 


where Wien) yee lal a ete Ol fs Ss oes fa 


Suppose: G. = (0, G7; «-- » 9.) where g°, J = ty 2) ont t 
iL 2 m 7 
represents the j<th column of G. Then, according to the above 
assumption, the elements of G represent the extent to which a 
term is related to a document in the data base. The measure of 
the extent to which a term is related to all documents can be 
J 


defined as: 
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the significance value of coun in the data base. 
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In order to determine the set of index terms that carries 
Significant information content, an arbitrary cut-off value K is 
imposed. The average of all Yi9 dis chin 27 e's. ay. MES OeCM SACO De ad 


reasonable cut-off value. Hence, by definition, 
n 
Fe) 1 ate a¥icne £2,935 
Consequently, every tern t, such that Ne 2 K will be regarded as 


an index term. Suppose there are m' number of such terms which 


constitute the set of index terms I, then, in set notation, 


Hi 
i) 


{Set of index term}. 
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The automatic indexing algorithm is summarized in the 


following statements: 


1. Create document-term matrix W. 


2. Generate term-term association matrix C using an appropriate 
statistical association measure. 


3. Calculate G = yWC to form the weighted document-term matrix. 
f, Calculate Y, = las tt for alloyeH-1y 27 00. 5 We 


5. Calculate K and determine the set of index terms I. 
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2.3 Storage Problems 


A subset of 5150 journal articles is taken from CSDATA for 
testing the automatic indexing algorithm. It can be seen from 
the flowchart of Fig. 2.2 that the procedures are rather 
straightforward. Nevertheless, a complication arises as very 
large matrices are involved in computations at various stages of 
the algorithm. There are altogether 1801 distinct terms in the 
test data. Thus, a document<term matrix alone will require 
approximately 37 million bytes of storage. Obviously, the 
conventional method of storage and matrix multiplication cannot 
be used. it is therefore necessary to develop an appropriate 


technique to cope with the problems created by such matrices. 
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Read data base 


and create a |” |stop-list 
term-file and W. 


Sort term-file 

into alphabetic 
order by the IBM 
sort & Merge Package. 


Obtain £7 by 
zc 


frequency count. 


Eliminate high & low 
frequency terms 

and assign a rank 
number to each term. 


Match rank numbers 
of document; to 
document: to 


obtain fiz. 


Compute C = (c,,) 
using formulae’J 
2.38,- 0) and ,c. 


Apply subcript- 
matching algorithm 
to obtain G = Coe 


Compute G for all g 
and K and 


Fig.2.2 Flowchart for the Automatic Indexing Algorithm 
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The simplest apfroach is to store the elements of these 
Matrices in the most economical storage format; then matrix 
multiplication can be carried out by applying a fairly simple 
subscript-matching algorithm {18]. Consider the matrices W, C, 
and G. Since they are very sparse matrices, only non-zero 
elements need he stored. In order that the original matrix can 
be restored efficiently, the row number, the number of non-zero 
elements in the row, the column number and the corresponding 
value for each non-zero element are stored. The storage format 


is shown in Fig. 2.3. It is known as the least-storage scheme. 


The example in Fig. 2.3 records P, number of non-zero 


elements in row i. For a matrix of dimension nxm, we have the 


following interpretation of symbols: 


i—el—th crow indicator, 1 5 i S n, 
Pp, = number of non-zero elements in row i, 0 Sp <n, 

rf 
j = g-th column indicator, which points to the column in the 
. original matrix, o = 1, 24 eee 5 eee 1 roe Ease a ae <M, 
Vv = value of the element of the i-th row and the tenes column 
1j 
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It is fairly easy to show that the least-storage scheme 
requires much less storage than the conventional method which 
stores every single element of the matrix. Given an nxm sparse 


matrix, Suppose 


S = the total number of words required to store the given 
matrix by the conventional method, and 


S = the total number of words required to store the given 
& matrix by the least-storage scheme. 


Assuming that at least one non-zero element appears in each row 


Or column of the matrix, then, by definition, 
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the matrix. It can be shown that for a given nh, a > =H, for 


every m > 2(1t+h). Graphically, this is shown in Fig. 2.4. 
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Fige 2.4 Comparing Storage Requirements for the 
Conventional Storage Scheme and the Least-storage Scheme 
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In CSDATA, the average number of terms per document is 
approximately six. Hence, to store the document-term matrix by 
using the least-storage scheme reguires about 1/150 of the 
storage required by the conventional method. It may also be 
noted that for a sparse symmetric matrix such as the term-ternm 
association matrix, the storage requirement can further be 
reduced by storing only the diagonal and upper (or lower) 
triangular non-zero elements. When using the least-storage 
scheme, care must be taken to note that for certain rows 
(columns) the diagonal and the upper (or _ lower) triangular 
elements may all be equal to zero, then the record for this row 


(column) will not appear in the storage file. In this case, such 
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a row (column) is said to be null with respect to the storage 


file. 


As an illustration, consider the sparse symmetric matrix A 


given by: 


Then, the entire matrix will be stored in a sequential file as: 
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Note that the fourth row (column) of A is null w.r.t. The 
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storage file. 


Consider a general mxm Sparse symmetric matrix. Let 


S.' = the total number of words required to store the diagonal 
and upper triangular elements of the given matrix by the 
conventional method. 


S * = the total number of words required to store the diagonal 
and upper triangular elements of the given matrix by 
the least-storage scheme. 
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Figs 2.5 Comparing Storage Requirements for the 
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Thus the least-storage scheme allows a saving of a 
tremendous amount of storage space. In many instances, the 
matrix stored in this format can be loaded in core, thereby 
eliminating costly and time-consuming I/70 access times that 
would be required if the matrix were stored on an auxiliary 
storage device. A very Simple subscript-matching algorithm has 
been devised in conjunction with the least-storage scheme in 
order that matrix multiplications can be carried out effectively 


and efficiently. 
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2.4 Programming Considerations 


ee ee et ee ee 


in this section, some programming details for generating 
the set of index terms will be discussed briefly. Several 


intermediate files are essential for the entire process. 


(i) Data File: 


The data base is read sequentially. Each document is 
assigned a document number and each title term a position number 
according to its sequence of occurrence. For each term of each 
document, a record is written in the format as shown in Fig. 


2.6. 


Document No. Position No. 


Fig. 2.6 Record Format _ of Data File 


The set of sequential records constitutes the data file. Note 
that this file preserves the original information of the data 


base. 


The document-term file is the data file arranged in 
alphabetical order according to terms. This file is also called 


the inverted-index file. 
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(iii) Frequency File; 
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The frequency file consists of the frequency fietor pall 
distinct terms in the data base. This is easily created by 
counting the number of records that contain the term. A record 


erethe frequency file is shown in Fig. 2.7. 


Term, Frequency. 
al i 


Fig. 2.7 Record Format of Frequency File 
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Gy) Term-directory File: 


The frequency file is sorted in descending order of 
frequency. High and low frequency terms are eliminated. Each 
term is then assigned a rank number according to its sequence in 
the sorted frequency file. In the case when several terms have 
the same frequency of occurrence, consecutive numbers are 
assigned arbitrarily. A record of the term-directory file is 


shown in Fig. 2.8. 
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(v) *Teem=pairo File; 


——s 


From the data file, all possible term pairs are abstracted 
from each record to create the term-pair file. A record of the 


term-pair file is shown in Fig. 2.9. 


(vi) Co-occurrence Frequency Fil 


The term-pair file is modified by interchanging the terms 
in any term pair whose second term has lower alphanumeric value 
than its first term. The file is then sorted alphabetically 
according to the term pairs. The frequency of co-occurrence of 
each term pair is then counted. A record of the co-occurrence 


frequency file is shown in Fig. 2.10. 


Fig. 2.10 Record Format _ of Co-occurrence Frequenc _File 
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(vii) Term-term Association File: 


— re a a es Se ee 


Three different term-term association measures are _ used. 
Note that since C is symmetric, only diagonal and upper diagonal 
non-zero elements are stored. A record of the term-term 
association file is shown in Fig. 2.11 where Cre | aa me ie 


Ber Grs, tO utne equations of (2.3a), (2.3b), and. (2.36). 
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Fig. 2.11 Record Format of Term-term Association File 
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The document<term file and the term-term association file 
are used to form the weighted document-term file. Rank numbers 
are used to represent subscripts of the elements of the various 
Matrices. A record of the weighted document-term file is shown 


in Fig. 2.12 where eco k = 1, 2, 3 corresponds to) the 


respective measures of c, ‘K), k = 1, 2, 3. 
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Fig. 2.12 Record Format_of Weighted Document-term File 
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fx) blndexeterm sfile: 
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For each weighted document-term file, the respective cut- 
off value according to (2.8) is calculated and an index-tern 


file is determined. 


The generation of all these files are quite simple except 
for the weighted document-term file. The document-term file and 
the term-term association file are transformed into the least- 
storage format so that the subscript-matching algorithm can be 


applied. The algorithm is discussed in general below. 
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There are three possible cases. 
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If the j-th column of C is not null with respect to File C 


and ae 2 oat then for each match such that r =s5 where 
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these products yields TG Diagrammatically, the matching 


mechanism is as shcwn in Fig. 2.13. 


Fig. 2.13 Matching Mechanism (Case _1) 
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Fig. 


Finally, the subscript-matching algorithm is presented 


formally in the following manner: 
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The flowchart as shown in Fig. 2.16 describes the 


subscript-matching algorithm. 
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2.5 The_ Index Term List 


there ale a*total®ot 29,677"teris in the test data of 5,150 
eOcuments= but "there areva total of only’ 3,787 distinct terms. 
The distribution of terms in documents is shown in the graph of 
Fig. 2.17. It is found that the average number of terms per 
document is approximately 5.76. The distribution of the number 
of documents that contain a given term is shown in the graph of 
Fig. 2.18. The average number of documents per term is 
approximately 32.65. After eliminating non-significant terms by 
a stop-list and excluding terms of low frequency 
(frequency = 1), a total of 1,801 distinct terms are left to be 
processed. Similarly, there are 84,450 possible term pairs but 


only 11,038 pairs are used. 


It is realized that since the matrices are very large, the 
calculation of G = uUWC will require a tremendous amount of 
computing time. Therefore, several random samples of different 
sample sizes are tested by calling the IBM Pseudo Random Number 
Generator subroutine CS003A which is written in FORTRAN IV. Each 
document has been assigned a document number, and uniformly 
distributed pseudo random numbers in the closed interval 


{0,5150] are generated by the following calling sequence: 


CALL CS003A (INIT) 
port t155 
1 CALL CS003C (A,B, SIZE,N) 


where INIT is a positive odd integer input value to initialize 
the algorithn, 
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J is the sample size, 


[A, B] = (0, 5150] are real input parameters for the lower 
and upper limits of interval, 


SIZE is the random number returned by CS003C, 
N is the sequence number. 

Three samples of size 50, 100 and 200 were tested. As the 
sample size increased, the sets of index terms resulting from 
the samples were found to converge to the same limiting set of 
index terms. The index term lists that result by using measures 
(2.3a), (2.3b) and (2.3c) respectively contain 815, 816 and 823 
different index terms; all the index terms that appear in the 
first list also appear in the other two, and they share 
approximately the same rank in each case. It can thus be 
concluded that the set of index terms common to all lists is 
representative of the significant terms of the data base. The 
final set of index terms is then chosen to be the intersection 
of the sets resulted from the three different index term lists 
using a sample size of 200. The corresponding significance 
values are taken to be the mean of the corresponding three 
significance values. The set of significance values are further 
normalized anto. the. interval [0, 1], and will -eventually 
constitute a subset of the feedback control parameters. The 
three different sets of index terms and the final set of index 
terms together with the significance values are given in 


Appendix C and Appendix D, respectively. 
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CHAPTER III 
DEFINITIONS OF SYSTEM FUNCTIONS 


3.1 The Query Language 


For any document retrieval system, it is essential to 
formulate a query language designed tc convey exactly the user's 
information need to the search processor. The transformation 
process is based on some logic operations implicitly contained 
in the query language. In fact, what constitutes the core of the 
search logic and the query language is the set of logic 
Operators used to formulate the search requests. Hence, it is 
extremely important that the syntax of the query language be 
well-defined. Conventionally, the Backus Normal Form (BNF) is 


used. Table 3.1 gives a brief explanation of the BNF symbols. 


Meaning 


variable name or expression 


is defined to be 


repeat m number of times, where 
m ¢f{a ,D], 4, 6b being integers. 


exclusive OR 


blank 


Table 3.1 Interpretation of BNF Symbols 
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The following specifications in BNF represent the syntax of the 


query language connected to the present systen: 


<search request> :: = <question statement> {<parameter>} 8 
<end statement> : 
<question statement> :: = QUEK<remark><relevance estimate> | 


QUEK<relevance estimate> | <comment statement> 
<question statement> 


<parameter> :; = <leading statement>{<subsequent ee TY 

<leading statement> :: = <comment statement><leading statement> | 
<primary logic operator>b<search particulars><weight> 

<primary logic operator> :: = AND | NOT 

<search particulars> :: = <search type>W<search item> 

<search type> :: = <author> | <coden> | <title term> | <year> 

<author> ::,.= A 

<coden> :: =C 

<title term> :: = T 

med? 2,5 =. vy, 

<search item> :: = item to be searched. 

<weight> :: = {<decimal digit>}‘* 


Picecital digit> +: = 0 f 1:{ 2.[ 3-4 { > { © 1-7 [ 8°) 3 


<comment statement> :: = bbb<remark> 
<remark> :: = a string of symbols. 
<subsequent statement> :: = <secondary logic operator>b<search 


particulars><weight> | <comment statement> 
<subsegquent statement> 


<secondary logic operator> :: = OR | NOR 
<relevance estimate> :: = <recall estimate><precision estimate> 
<recall estimate> :: = <weight> | <recall estimate> 


<precison estimate> :: = <weight> | <precision estimate> 
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<end statement> :: = END | ENDK<remark> 

The search processor may be so implemented so that it is 
capable of perfcrming a search for either a single search 
request or a batch of search requests. This is enabled by the 
definition: 

<batch> :: = {<search request>} 

where n is the maximum number of search requests the system can 
handle. Since a sequential search technique is used, the 
advantage of batching is obviously a considerable reduction of 
search time. In order that no error in a search request may 
affect other members in the batch, those incorrect search 
requests are treated as if they do not belong to the batch. Upon 


output, appropriate error messages are issued. 


In a batch of search requests, the QUEStion statement and 
the END statement of each request serve as delimiters. Each 
request allows up to eight parameters. Each parameter is led by 
a statement using one cf the primary logic operators AND or NOT. 
It is then followed by not more than nine other statements in 


any combination of the secondary logic operators BOR and NOR. 


Four search types can be used. They are author name, 
journal coden name, title term and year of publication, 
respectively denoted by A, C, T, and Y. Each search request item 
may be given an arbitrary term weight, all up to four digits. In 
the absence of assignment of weight, the default value of one is 
automatically assigned. At the same time, the user may specify 
on each QUEStion statement his anticipated recall and precision 


values as defined in Section 5.1 which have a default of one 
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hundred percent. The term weights and the estimated recall and 
precision values will eventually constitute a subset of the 
feedback control parameters. It is noted that any number of 
comment statements may appear anywhere in a search request. They 
do not contribute to the search operations, but are merely 


designed for the users to make remarks. 


A user may submit his batch of search requests either in 
the form of a deck of cards or via a terminal. In any case, the 
appropriate input format must be used for the different kinds of 
statements. The input formats can be generally classified into 


four types as shown in Fig. 3.1 (a) to (d). 
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1 nie dole 65 68 69 12s 80 


Comment Recall Precision 
Estimate |] Estimate 


(a) Type I: QUEStion Statement 


Comment 


(b) Type Ii: Comment Statement 
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(4) Type IV: END Statement 


Fig. 3.1 Search Request_Input Format 
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Note that columns 73-80 of any input format type can be used 
freely by the user. Sometimes, sequence numbers or identifiers 
may prove to be useful. The content of these columns are ignored 


by the search processor. 


The following is an example of a batch of two search 


requests. 

QUE SAMPLE REQUEST #1 SOS 685 
SUBMITTED BY USER ALO. 

AND A AEDALI SK 120 


OR A LEVIALDI S$ 

TOPIC OF INTEREST IS PICTURE PROCESSING. 
AND T PICTURE 50 
AND T PROCESSING 

NOT LIKELY TO APPEAR IN THE YEARS 68 TO 70. 
NOT Y 68 
NOT ¥ 69 
OR Y 70 

CODEN NAME OF JOURNAL IS CACMA OR PAT. 
AND C CACMA 30 
OR C PAT 40 
END 


SAMPLE REQUEST #2. 
QUE 
REQUIRE ALL ARTICLES BY G. SALTON, 
APPEAR IN THE JOURNAL IFSRA IN 1971. 
AND A SALTON G 
AND C IFSRA 
AND Y 71 
ALSO REQUIRE A PAPER BY A. ROSENFELD, 
APPEAR IN THE JOURNAL JACOA IN 1966. 
AND A RCSENFELD A 
AND Y 66 
END OF SAMPLE REQUEST #2. 
END OF BATCH OF TWO SEARCH REQUESTS. 


ya 7 | ee “ares 
Ce ee Le diltielea aan 


eenb tig b: 10 atedaun couaubee Wuntseaos -2980 4 ie wi 
rie 
beaonal iis Amdso> scedt T faetnos oat Lptesn oe ¢ be 


dorese ov Yo dated 5 YO @iquske ae et pat 


Tt Legh ty P- 
cOJ A) Wige : 


aa! tr 
pwrer: £39. ce 
‘ 4 ' ‘AanY BAY EL GARISA OF aeneee 


ANIAD @T SAAUOL 1 See 


ee TeTN9ag, 198 


2ORae, eR eee Reel 
FOOD WE Abeea AA eee 


RPTaet20d ZA ie BPRAD A 2h 
ssner “I AOBAL shoud 
A 


4y 


The first example requests a weighted search for any 
document written by ABDALI SK or LEVIADI S$ on the subject matter 
of PICTURE PROCESSING and appearing in the journal called the 
Communications of the Association for Computing Machinery 
(CACMA) or the journal called Pattern Recognition (PAT), not 
from 1968 to 1970. The second example specifically requests all 
the articles written by SALTON G and appearing in the Journal 
called Information Storage and Retrieval, in 1971. It also 
requests a paper written by ROSENFELD A appearing in the Journal 
of the Association for Computing Machinery, 1966. 


—— 


3.2 Relevance Criteria 


As stated in the formal query language definition, the user 
may assign weights to the terms that comprise his queries. These 
weights are designed to reflect the user's own point of view 
about the term usage or subject matter. A weak point in this 
approach is that the user often does not have the slightest idea 
how much weight he needs to assign to a term and how relative 
the weights should be in order that the system will interpret 
his viewpoints correctiy. In some instances, a term considered 
to be important to the user may be very insignificant to the 
yet en. To remedy this, the list of index terms and their 
significance values are presented to the user to assist his 
preliminary ‘judgment of the importance of terms. However, 
relevance judgment of the set of retrieved documents depends 
also on some system parameters. The set of relevance judgment 


criteria that takes into account both the user's viewpoint and 
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the system parameters may be developed in general as follows. 


Let D= {dj, do, --- , d,} be a set of n documents that 
constitutes the data base. Suppose m is the number of distinct 
terms used to index all the documents. Then, these index terms 
can be regarded as defining an m-dimensional term space denoted 
by . Each document ds, 1 = 1, 2, , ««. » n is then represented 
byeeaVececr Olean m-tuple (d7¢1>, a 5<1), j..; d 622) in tT” an 
which each ais p> ly) 2p ey, MW 2S the signiticance valiertor 
the j-th tern in the i-th document. Furthermore, the 


Significance value of a term is bounded on f[{a, b] where 


b >a > 0 are real numbers. 


Suppose a query is denoted by g = (dir doe see ¢ qa where 
each ce j= 1, 2, «ow» M is the weight attached to the j-th 
term corresponding to vt” and is bounded on {a, b]. These weights 
may be assigned manually or may be a_ result of automatic 
adjustment. Again, gq may be regarded as a vector or m-tuple in 
tee Pit is @ithen possible to define some criteria to determine 
whether any given document d. VS relevant ~ to “g. This” Set “or 
criteria is known as the relevance judgment criteria or simply 


the relevance criteria. These criteria are given in Definition 
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Suppose W, 9, 5, Oo are arbitrarily small, positive, cedt 
humbers, then a document a. is said to be relevant to a given 


query q if and only if one of the following conditions is 
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satisfied: 


ae oe (3.1) 
Bie Poche e ae. 20a WS sae ea i ea eh a (322) 
3. CleG@sy. 3G 11 S gq. (3. 3) 
4. g (I-dyea/I ay WE It ag dt) + a [ll da dl - (tg 1 7 

where &, 8 ¢[0, 1] are real constants. (3.4) 
Ba epee / ts) ee Ss, oo, 


where ny, = number of common terms used to index d, and q, 


ny = numker of different terms used to index d; and q, 


4 
i" 


a real constant. (SiS) 


Obviously, condition one is the most desirable relevance 
eriterion of all since the document a. is exactly specified by 
the query g and perfect matching occurs in that eed aed £or 
Ee Nj => al, 2, «se ,» Mf. However, in general, this condition 2s 
POOMLeSstraiGtive wor practical considerations. Therefore, some 
tolerance values are introduced to allow more flexible judgment 
of relevance. Condition two implies that if each absolute value 
of the ‘ditference of the significance value’ ‘of a, and the 
corresponding weight of q is less than or equal to a given 
tolerance value w, then the document a. is taken to be relevant 


to the query g. In the case when all these absolute values are 
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equal to zero, condition one is maintained. Similarly, condition 
_ three promises that if the length of the vector difference of ce 

if 
and gq is less than or equal to a given tolerance value o, then 
the document a. will be regarded as relevant to the query q. If 
the value of {| d. ="q>[| 2S Zero, condition cone: @s oaqain 
maintained. . 

The inclusion of 8, and B, in condition four allows varying 
stress of importance upon either the angle between d; and g or 
the absolute value of the difference in lengths of the two 
wectors. ©The “usefulness .of B, and 8, will be seen in section 
3.3. Lastly, condition five merely states that if the number of 
common terms used to index = and q is to some degree close to 
the number of different terms used to index a. and gq, then the 
document a. can be regarded as relevant to the query q. This 
criterion, as well as criteria two, three and four, may result 
in judging the document a, relevant even though some of the 


terms used in d do not appear in the query g but are actually 
ob 


related in context to it. 


3.3 A Relevance Measure 
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According to the set of relevance criteria of Definition 
3.1, it is possible to have several documents judged as relevant 


to a given query. Hence, it is desirable to have some kind of 
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relevance measure which can distinguish the more relevant 
‘documents from the less relevant ones. Recall that (3. 4) 
provides an option for emphasis on either the angle between ad. 
and q or the absolute value of the difference in lengths of the 
two vectors. In one case, BS and B, can both be set equal to 
one-half to indicate that the two quantities are equally 
important. It is, of course, possible to have any combination of 
values for BS and Bo. The determination of 8, and 8, depends, 
for example, on the definition of ad; and g, the relationship 
between the two sets of weights, and so forth. For fatane oF 
under the above definition of a, and g, one may set (8, , 8) to 
be (0, 1) since if the coordinates of the two vectors are close 
together, then the angle between the two vectors must tend to be 


ZETO. 


Now, we can simplify and generalize Definition 3.1 by 
redefining d, and g in the following manner. Suppose eer s Y, 
MEd Vem eel?) ee) fp) tt | ANd. Sj = 914, 2 oy seep) RP WheH chewy. 
bounded on [a, b], where b>a>0Q are real constants. Let 
g = (doe ve See oF} such that all qe j = 1, 2, s<u @ 8 are 
bounded on {a', b'j] where b' > a' > 0 are real constants and b' 
and a' are some multiple of b and a respectively. Recall that 
the derivation of the significance values of terms is based on 
some statistical association measures which are in turn a 
function of frequencies of occurrence and co-occurrence of 
terms. Hence, the significance value of a term is a measure of 


its relative importance with others, and it is not a measure of 


absolute importance. As a result, the vectors d; and q are 
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closely related if the angle between them is close to zero. 
Hence, we can assign (85 ’ B 5) to be (1 ,0). Consequently, we 
can define a simplified and generalized relevance criterion for 


the present system as given in Definition 3.2. 


ae ee eS oe ee Se 


A document d; is said to be relevant to a given query g if 
and only if 


DoS R(Gy Gg) SN, (3. 6) 


peeeewen(ds, Gl = di-a/li a, 11 (1 og 11, (3.7) 


= relevance measure of d, to gq, 


and, ie a pre-determined cutoff value. 


The correspondence between (3.6) and (3.4) is easy to 
derive. According to the above arguments, (85 F B >) = (07 20) 5 


Therefore, from (3.4), we have 
1 oa R(d., q) = Ee 


rete (y= s). =e SS R(d., g) >. sance R(die g) is a measure of the 
cosine of the angle between qd. and gq, it is naturally tess than 


or equal to one. The derivation of (3.6) is thus completed. 


There are a few interesting properties associated with the 


relevance measure R(d.» gq). They are as follows: 
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(1) When R(d., gee 17> condition one of Define tien »o.4. fis 


“Maintained. 

(2) = arcos{R(d., q)} is the angular distance between d. and 
aps 

ey The function AG eed) = = R(d.. g), in which R(d., q) 


satisfies the condition of (3.6), is a metric which satisfies 


the following axicms: 


(1) old e gq) 2 0 and ad ’ a.) =a03 
(11) Ads g) = (de d.)- 
Gi) dye 9) < dye @ + ple dd. 


(iv) Ls # g, then Ad., g) > 0. 
i 


(4) When R(d + qe 0, d, . g = 0 implies that the two vectors 


do not have a single term in common. 


Having defined the relevance measure, it is then fossible 
to define the degree of relevance of a document in relation to a 


given query. This is given in Definition 3.3. 


Werinition 3.3 


Let D. = {dye 1 = 1, 27 eos » k} be the set of k documents 


judged relevant to a given query g by Definition 3.2. Suppose 
R” = {Up ly 29 00s 9 K} cs the corresponding set of 


releyance values determined by (3.7). Then the degree of 
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relevance of any document d; ¢« D. to g may be defined to be the 


R 
relevance value c,- Furthermore, a document d, , D_ is said to 
fc. R 


be more relevant to gq than any other document dq; aD tOng att 


R 
and only if the relevance value of d, is greater than that of 


One aeG~e, 5 FE. > Tye 
“a 1 J 


According to this definition, the set of provisionally 
relevant documents can be arranged in descending order of their 
relevance values thus showing the relative degree of relevance 
of the documents to the given query. This arrangement enables 
the system to decide which members cof this set are indeed 
relevant. It also plays an important role in the query 
modification in the optimum itertaive feedback algorithm to 


follow. 
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CHAPTER IV 


THE OPTIMUM ITERATIVE FEEDBACK ALGORITHM 


SS ES EES AD EE SESE SE cee aD aS <aNw <SME-anew Sew aS GUNE-HEED SooDGaae abeb one an sates tne 


Before discussing the development of the optimum feedback 
algorithm, it is necessary to give a general description of the 
parameters involved. There are two interrelated sets of 
parameters. They are respectively called the set of user 


parameters and the set of system parameters. 


It is observed from the formal query language definition 
given in section 3.1 that the user may specify the kind of 
output he anticipates in terms of recall and precision. Suppose 
E(r) is the expected recall value and E(p) is the expected 


precision value both belong to the interval [0, 100]. Now, let 


r= normalized expected recall value. 


eer) 7/100, (4.1) 
and, p = normalized expected precison value. 
= 8 (p))/100- i) 


Suppose a set of m' < m number of title terms are used in 
the query. Then the set of term weights assigned by the user 
Q= {q., i = 1, 2, ««- , m'}, together with r and p form the set 


of user parameters U which is denoted in set notation as 
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OU 1Er Pe OQ} - (4. 3) 


Recall from Definition 3.2 that some document coe eS 

Seah 
regarded as relevant to the query g if and only if 
— < R(d., g) <$ 1. Note that the system transforms the set of 


term weights, Q, into a vector g = (dor qos amet q) by 
rearranging the subscripts according to the order of the index 
terms. For any q g¢ Q, the value of zero is inserted. From the 
system's point of view, the value of T should be bounded _ so 
that the validity of the relevance measure and the effectiveness 
of system performance are maintained. Since the closer the value 
of one gq) is to one, the higher is the degree of relevance of 
the retrieved document foe Hence, it is natural to think that T 
Should be assigned as close to one as possible. However, having 
T too close to one will most likely result in high precision 
but low recall. Conversely, having T too far away from one will 
most likely result in high recall and low precision. In order to 


compromise this, let us define a threshold value T in terms of r 


and p as 
T = max{0.7, 1 - {log r/(1tr2) 1 + {log p/(itp?) I}, (4. 4) 


where log is the common logarithm. Alternatively, (4.4) may _ he 


represented approximately by 


1 - jlog r/(1+r2){ - tlog p/(itp?) I, af ep 2s (4.5a) 
= { 
ieee ete. <2. 4. (4.5b) 
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The effect of (4.4) results in bounding T in the interval 
[0.7, 1.0], which is a reasonable range to maintain effective 
selection of relevant documents. The values of T as given by 
(4.5a) for the values of r and p between 0.5 and 1.0 in steps of 
G@.? are given in Table 4.1. The graphs of T versus r.p as 


defined by (4.5a) and (4.5b) are given in Fig. 4.1. 
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Consider the case when no document is judged relevant to a 
“request by the relevance criterion of (3.6). Suppose there are 
some documents whose relevance values are short of the value of 
{ but are, to a certain extent, close to it. The possible source 
of discrepancy may come from the set of term weights, Q, 
assigned by the user. Then, it is quite possible that by 
modifying the original search request, some of these documents 
or some other related documents in the data base may be judged 
as relevant to the request. It is therefore necessary to define 
another threshold value T' in which T > Tt, so that any 
provisionally relevant document whose relevance value falls into 
the interval [T', T] may be considered as capable of being 


improved. The expression for T' may ke defined as: 


T! = 27 - 1, (4.6a) 


RaxiOet eit. +) 2[ 10g r/ (1tré) [.— Zilog p7(itpe) 43. (4.6b) 


As before, T* may be represented approximately by the 


following relations: 


1. - 2}log r/(itr2) | -— 2410g p/(1tp2) 1, if rp 2 .4. (4.7a) 
eS { 
eee. ph <a (4.7b) 


The effect of (4.6) results in bounding T' in the interval 
fet, 1.0-). The values of T* as given by (4.7a) for the values 
of r and p between 0.5 and 1.0 in steps of 0.1 are given in 


Table 4.2. The graphs of T' versus r.p as defined by (4.7a) and 
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(4.7b) are given in Fig. 4.2. 


Table 4.2 Values of T! 
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It is worthwhile to note that the definitions of the 
threshold values as given by (4.4) and (4.6a) are obtained 
empirically. A system operator is therefore free to modify these 
values according to need. Alternatively, the two threshold 
values may appear in the form of parameters to be supplied by 
the users. In the present system we define the cutoff value T 
to be in the interval [T', T]. Then, according to Definition 
3.2, a document is relevant to a given query if and only if its 


relevance value lies in the closed interval [T, 1]. 
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Consequently, a query gq is considered to be Capable of 
being improved if there exists at least one document 4d such 
ak 


that 


T'< R(d., gq) <7. (4. 8) 


The set os Significance values of index teras 
Y= {y.4 J = 1, 2, ..- , m} obtained by the automatic indexing 
algorithm, together with T, T and T' form the set of system 


parameters S which is denoted in set notation as 


See Ty iligeds 5 aie (4.9) 
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In document retrieval systems, the optimization of 
retrieval output is to find a set of documents which satisfy 
some pre-determined criteria utilizing some known parameters. in 
the present system, the two sets of parameters U and S play an 
important role in the optimization process. Consider a given 
query g = (dir Gore «++ «# Ay): Let Do= {d., ge ne a ey Wile Oe 
be the set of h documents which satisfy the relevance criterion 
7 < R(d;, jas, esiwd larly; Sler pe) = igae 5) lps eed Cpe 


R 
be the set of ht documents which satisfy the condition of (4.8). 
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relevant hits, and presented to the user in descending order of 
their degree of relevance. Now, suppose as = —cand De wy tata 
then possible to modify the original (or areqioyety modified) 
search request and pass control back to the search phase to re- 
examine if indeed any of the documents € De may now be judged as 
relevant hits by the relevance criterion of (3.6). Define the 


most satistactory document d e« De for improvement as the one 
Me 


such that 


R(d , q) = max ASSO Tee) oe. (4.10) 


A set of new symbols are introduced to facilitate the 
explanation of the algorithm. Let qc) be the query =n 
conjunction to .the k-th) iteration, for some k = 0, 14° 2, <2. | 
such that g‘°) = gq and gtk) = ea, qs, Foe A i Ne Suppose 
oe? = ic; pe ae a pel nok} is the set of h¢*) documents 
judged relevant to g by the relevance criterion 
Te< R(dC, qe). <)1 "in conjunction? tovethe “k=thY ateratione 
Similarly, suppose D&C*) = (ag, 5 201, ioe. ORS bOI eieerae 
set of h'€K) documents judged relevant to q by the condition of 
(ech an conjunction tothe k-th iteration. Also; i =D and 


R 


DRC) = DL, Suppose D¢k> = $ and Dece? # > for some k, then the 


(k+1) -st modified query gq‘k+t1) can be defined as: 


g¢K+19 = gtk + 2S ck), (4.11) 
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ee Me soe lasek: Ve, (4.12) 
Ck) = Ck) c 
2G k ok dc, (4.13) 
et as eer (4. 14a) 
tee spiede girth gt > acko, (4. 14b) 
ale site 
0, if gtk) = ack, (4. 14¢) 
i i 


such that tes is the i-th element of ac), ana qc 4 is the i-¢h 
at 
element of aoe Nhe CK), Bil Fs 2s wieces New les and, 
A a4 
ad¢k) € pt¢k) such that, 
Q R 


Bispace g yes de maxweugR(ac, g¢") i. (4.15) 
X oe € Dt 


Lt DS = > and Be = > for scme k, then the query q wit 
be considered as having no hits. The sequence of queries 
faG02, °GG1)> 4... , q°), «ss } performs the necessary iterative 
control over the decision of ‘an optimum set of retrieved 
documents. This sequence can be proved to be convergent and the 
proof is given in the next section. In summary, the optimum 


iterative feedback algorithm is given as follows: 
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plHPe ley qset k= .0. Receive qo) = (q69), aCe a step ed Goo) Cand 
ik m 
scale each Geo? i= 1, 2, ... ,m in the interval defined by 
min -, tax aie 
Boe Yas BAL 


J J 


SLEP 2: Determine T and T*. 
pee ree eereierueve: (CK), 4.92 94,°2,, ssa » Me .at eid, go to Step 5. 
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Peete (lye T SR (dk) Gtk) =< 1) form ue a GOP tO, SLED So. 
ii 


(ii) If 1' < R(Gt q(k)) < tT, form Hens Go to 
(iii) Otherwise, go to Step 3. 


SLEEP >: (1) If Dee # >, go to Step 6. 
(Ger aes nS = @o and Be = >, go to Step 7. 


(iii) Otherwise, determine oe such that oe E De 


and R(a¢K), gtk) = max fR(d,¢K), g¢k))}, and set: 
Q dk) ¢ D'ck)a 7+ 
met R 


q(K+1) = qt ko + ay Qekd, 


where a is as defined by (4.12) and &¢K) is defined by 


(4 13) 5 and ACK) is defined by (4.174a), (4.14b) and (4.14c); 
a 
ac") cack), 4 = 1, 2, «.- , @. Then increment k by one and. go 
; a 
cOestep 3. 


STEP 6: Arrange R(d$*), Gs} arora did Se such that 


T < R(ACk)2, q’k?) < 1, into descending order of relevance value 
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and output documents as relevant hits. Then go to Step 8. 


TEP 7: Display "no hits" message. 


STEP 8: STOP. 


Finally, the flowchart for the optimum iterative feedback 


algorithm is given in Fig. 4.3. 
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4,3 Convergence of the Algorithn 


The above algorithm is convergent if there exists a 
document a: « D such that after n finite number of iterations, 
7, s Ride, q°')) = 1. in other words, af the set a becomes 
non-empty after n iterations, n must have a lower bound and an 


upper bound. The proof is trivial and is given as follows: 


Proof 


Suppose after n iterations, there exists a document a” e D 


much that, “1 < Rid, qo?) <1. ‘Then, by definition, 


Rd, GOO) = a gt) d= i 1 ao 1. (4.16) 
We have, from (4.11), 

a¢k) .gCkt1) = ak), gk) + a (a6k), QCk), (f.17) 
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From (4.13), it is obvious that, 


ma ome ack, (4.18) 
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g*.gom 2 no, wy? (4.20) 
Similarly, we have 
figek+a p42 < |y gek? 12424 (dR go 4a et de 11S Gs 21) 

= |) gk 4y2 + aA ees? ANE ee (4.22) 

Suppose 6 = Hoes Pde" 4 |). Then, atter A iterations, 

Lb gem yy < /3ng 6 (4.23) 
Finally, combining (4.20) and (4.23), we obtain, 

Sf20 <9 < 30, (4.24) 
Were, oO =) 02/u*. Since © and W are finite, therefore 1 25 
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It is noted that the algorithm described above takes into 
account of document title terms only. In general, it is 
desirable to include search on other items such as author names, 
journal coden names, year of publication and so forth. Suppose 
there are t different search items other than document title 
terms, we may let the modified (or augmented) document vector qd} 


be such that 
ie — ae (ees Qk AS hws) Be ouel st as, ) g (4.25) 
and the modified (or augmented) query vector gt be such that 
COE (Gy tye! “ee Des Peres. Hy (4.26) 


where a_ and b Do= 1, 2) 2.0. » & are the subvectors, associated 


h aoe 
with each search item; a. and g being the usual document vector 
and query vector respectively. Furthermore, each ay, and by has 
the same dimension so that d and g*' are also of equal 


dimensions. 


Without further knowledge of how much weight should be 
assigned to the elements of the subvectors a, and ps 
h= 1, 2, se , t, we may assign a value to the j-th element of 


the h-th subvector denoted by at}? such that 


oe ad ans? E da. (4.27a) 


h 0, otherwise. (4. 27b) 
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Similarly, we may assign a value to the j-th element of the h-th 


subvector denoted by b¢J) such that 
h 


iar pte be bey? eg. (4. 28a) 


0, otherwise. (4. 28b) 


We may further assume that the values of amy and poe for 
all h and j, are invariate under query modifications. In other 
words, query modifications apply only to the subvector g in the 
new (or géneralized) system, where all document vecotrs and 
query vectors are replaced by = and q* respectively. The (k+1)- 


st modified query g*¢k+1) is therefore symbolically represented 


by 


gt (Ke) = [Goreme - b - pb Stith 8 b ). (4.29) 


Consequently, the generalized system can be proved to he 


convergent in that 
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Conventionally, the standard evaluation measure 
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of system 


effectiveness is defined in terms of recall and precision. 


Recall 


is the proportion of relevant documents 


retrieved; while precision is the proportion of 


documents actually relevant. Let 


N 


x 


ve 


Z 


Then the number of documents not retrieved and not 


Nev “oe 


the 


the 


the 


the 


total number of documents in D, 
humber of documents retrieved and relevant, 
humber of documents retrieved and not relevant, 


number of documents not retrieved and relevant. 


actually 


retrieved 


and 


relevant is 


We may construct a two-by~two contingency table as 


shown in Table 5.1 to represent the above situation. 
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Retrieved Not-Retrieved 


Relevant 


Not-Relevant 


Table 5.1 2-by-2 Contingency Table of Retrieval_and Relevance 


Then, by definition, 


Recall = x/(x+tz), (3.50) 


and, Precision = x/(xty). CSce2) 


It can be observed that for systems in which retrieval is 
not based on relevance, the determination of recall and 
precision values according to (5.1) and (5.2) requires a 
tremendous amount of manual work. While this method of system 
evaluation may be all right for a small data base, it is 


absolutely impractical in normal situations where very large 
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collections of data are involved. Consequently, system 
evaluation in terms of recall and precision may be treated as a 


theoretical entity. 


AS mentioned earlier, the prime objective of a document 
retrieval system is to achieve the retrieval of all relevant, 
and only relevant, documents in response to any user's query. 
Hence, we would want to define a system evaluation measure in 
terms of relevance instead of recall and precision. Note that in 
either case, the meaning of relevance must be well-defined. In 
the present system, the meaning of relevance is given in 
Definition 3.2. Now, let 
D' = {Set of documents retrieved}, and 
D'' = {Set of documents not retrieved}. 


Then, by definition, 


Des" UaD (5. 3) 


where U represents the union of sets. 


Suppose, as usual, q represents a query vector, and ad.» 


i =1, 2, ... , N xrepresents a document vector in Diewit.can) be 


pointed out from Table 5.1 that there can be four cases: 


Ls d.c¢ D* and relevant; 
ele 

ates aie D' and not relevant; 

III. d ¢ D'* and relevant; and 
at 

iv. d ¢D*!' and not relevant. 
al 
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Obviously, we would like to have all d retrieved in response to 
at 
gq to satisfy only case I and all d not retrieved in response to 
i 
g to satisfy only case IV. In general, we may define a system 


evaluation measure E to be 


E = 6 y Sri - 
i (¢. 7) 6 5 phage gs ( rg) 
ola dy elsh 
- 6 Rid + R(d -d), 5.4 
3 ( ee 8 zr (c. q) f ) 
Gewese Cel 
— 1 ==. 
where oF =X, (5. 5a) 
ye R/T (5.5b) 
a = 1/Z, (5.5c) 
eh = 1/(N-x-y-z). (5.5d) 


Nete that the value of z is an unknown. However, there are 
different techniques such as statistical sampling methods that 
May be applied to determine z. Since relevance values are 
actually part of the calculations of the search process in the 
present system, the evaluation of E is quite simple and 
straightforward. Herce, it can be concluded that the definition 
of £E in terms of relevance is far more practical than in terms 


of recail and precision. 


In any given document retrieval system, system performance 
is considered to be optimum if the value of a given system 
evaluation measure is a minimum or a maximum. For example, a 


system whose performance is based on recall and precision will 
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require either the recall or the precision be a maximum in order 
to achieve optimum system performance. According to the 
definition of E, to optimize the systen performance of any given 
system would mean the minimization of E. As for the present 
iterative feedback system, it can be observed that the value of 
E tends to be a minimum and therefore the set of retrieved 
documents can be regarded as optimum in terms of systen 


performance. 


5.2 Analysis of Search Output 


The optimum iterative feedback algorithm and the 
generalized optimum iterative feedback algorithm are tested on 
an IBM 360/67 under the Michigan Terminal System (MTS). The 
programuing ilanguase used is FORTRAN IV level G. In the 
paracraphs to follow, two sample search requests are given to 
illustrate the performance of the algorithms. Searches are 
performed on the first four hundred documents of CSDATA. 


(I) The Optimum Iterative Feedback Algorithm 


—— me a a a en a a ee a OO 


Sample search request one is given as follows: 


QUE 60/0 


TEST OF THE OPTIMUM ITERATIVE FEEDBACK ALGORITHM 


AND T DOCUMENT 800 
OR T INFORMATION 500 
OR T STORAGE 600 
OR T RETRIEVAL 400 


END 
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The threshold values are calculated to be 
(T', T) = (0.466, 0.733) and the reguest vector after 
normalization becomes (0.674, 0.421, O3505; 0.337) which 
corresponds to the terms in brackets as (document, information, 
storage, retrieval). The following list of documents and their 


relevance values is a result of the search process: 


Relevance Document 

Value 

Oa? tS AMDOAG81900T71VENEZ/STORA RETRI EDITI INFOR DICTI 

0.677 AMDOA6871903810CONN/RETRI ANSWE PROVI DOCUM 

0.674 AMDOA67180249DRABH/DOCUM THAIL 

0.674 AMDOA66170141SAVAG/USERS VERSU DOCUM 

0.657 AMDOA68190173STARK/WHALE/CARSO/THOMP/GAF DOCUM STORA 


REPRE SYSIE 
0.510 AMDOA65160005DALE /DALE /CLUMP EXPER ASSOC DOCUM RETRI 
05505 AMDOA68190363BAKER/NANCE/USE SIMUL STUDY INFOR STORA 


RETRISYSIE 


0.501 AMDOA64150150KENT /HEURI INFOR RETRI GAME 

0.491 AMDOA692003110CONN/INDEP AGREE RESOL DISAG ANSWE PROVI 
DOCUM 

0.486 AMDOA67180010LIBBE/USE SECON ORDER DESCR DOCUM RETRI 

0.475 AMDOA70210237KEITH/GENER EVALU INFOR STORA RETRI SYSTE 


Since the highest relevance value of this set of documents 


is 0.715 which is greater than 0.466 but less than 0.733, query 


ad OF here huote 229 may ov . 
se$¥s aoroey tueupex oe 7 bits (GET 0 
ave (TEE.0 ,avest gies D. OTR.) ene 
(ok nesoRal ,Jaexgovh) es eaeWossd at Bates tnd baog 
tiv’? for ‘aoaunoh %o saat vak¥olio’S oft . (tse panies 


vais ooag dotese ata to _—? 6 af) noite 


S73n0 700 


2904 AOANT FITTS TAVSM AqUTS\EUKeVePoOeFeeeoNs 
vyooe TV004 .Geesn wT aNe tiie 3k 

ffrut nvdo0 VS Ad BMRB PAO ot 
Muon dgehs Seta 


akote BY20n | -TAN\TNONTS TO TWN aR a cia Stabe: 
a 
7 


THOS. MUDOW. D0ce4 | EORE) Wel Gah Sani A 


STRISG TAT, 


Amott: 40Ga VORTS inyze Te. tos WTAE OE ORT Ba oon | 
; ener 
avete Tatas ae a 
- ae 


ghad Tada BOTH, IMDAAN eae 
: 2 
ryuss. ewmys okarl foasa RaaDA au chs alii 1A 


Trta2 BWI0G “TO8T AMGAO VOQee 
atere Ah odeat anorve 2099. vtaPe a 


hs 


modification will take place. The new request vector now becomes 
(0.674, 0.882, 0.851, 0.761, 0.0, 0.0) which corresponds to the 
terms in brackets as (document, information, storage, retrieval, 
editing, dictionary). As the terms ‘editing® sandy ‘dictionary 
are insignificant in CSDATA, their significance values are 
negligible. The value of Ct is found to be 0.516. Consequently, 


the output of the second search is given as follows: 


Relevance Document 

Value 

0.898 AMDOA68190071VENEZ/STORA RETRI EDITI INFOR DICTI 

0.643 AMDOA68190173STARK/WHALE/CARSO/THOMP/GAF DOCUM STORA 


REPRE gsisie 
0.637 AMDOA681903810CONN/RETRI ANSWE PROVI DOCUM 
0.634 AMDOA68190363BAKER/NANCE/SUSE SIMUL STUDY INFOR STORA 


BETEREL SYStE 


0.631 AMDOA65160163HEILP/GOODM/ANALO BETWE INFOR RETRI EDUCA 
ie 96 AMDOA70210237KEITH/GENER EVALU INFOR STORA RETRI SYSTE 
0.554 AMDOATO2100890TTEN/DEBON/METAS INFOR 

0.549 AMDOA64150210KLEME/METHO COMPA ANALY INFOR STORA RETRI 


S¥STE CRITE REVI 


0.542 AMDOA69200072SWETS/EFFEC INFOR RETRI METHO 

0.521 AMDOA6819C0090SMITH/LEVY /PHYSI ORIEN METHO CLINI INFOR 
RETRI 

Uso17 AMDOA68190387POLLO/MEASU COMPA INFOR RETRI SYSTE 

0.913 AMDOA70210154WININ/DATA STRUC INFOR RETRI 

0.509 AMDOA70210004HUMPH/INFOR PEACE 


0.483 AMDOA65140014VERHO/BELZE/USE META LANGU UNFOR RETRI 
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SYSTE 
0.480 AMDOA68190275COTTR/EVALU COMPU SCIEN TECHN INFOR NUCLE 


SAFET INFOR CENTE 


i ds 


0.479 AMDOAOS1T60CO5DALE /DALE /CLUME EXPER ASSOC DOCUM RETRI 
0.478 AMDOA68190404TREU /BROWS RETRI GAME 

0.478 AMDOA65160291GARVI/VINFOR SURVE MODER LINGU 

0.474 AMDOA681902000CONN/QUEST CONCE INFOR NEED 


The highest value of relevance of this search is 
WHienh Ais’ greater than 6.735. Therefore, this set of documents 
will be regarded as the final search sutput to the giver search 
request asiag a cutoff value of 0.466. It is noted that the 
documents in this set are all related one way or another to the 
subject of document/information storage and retrieval. In 
practice, if the estimated recall and precision values are 
varied, the number of hits will also be changed. For instance, 
if a higher demand for recall and precision is imposed, fewer 


number of documents will likely be considered as relevant hits. 


It is important to realize that the search process depends 
a great deal on the term weights assigned by the user. Suppose, | 
in the above search request, the weights are changed to (892, 
822, 669, 630) in correspondence with the terms in barckets as 
(document, information, storage, retrieval). These weights are 
obtained from the set of index terms and their significance 
values. By using the same estimated recall and precision values, 


it is found that no iterative search is required. The search 


output is given as fcllows: 
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Relevance 
Value 


0.910 
0.742 
0.688 
0.681 


0.643 


0.631 


0.604 
0.291 
0.586 
0.586 


0.568 


0.564 
051559 


O26 


0.540 


(2538 


CEN Pa, 


Mad i2 


0.509 


0.506 


AMDOA68190071VENEZ/STORA RETRI 
AMDOA64150150KENT /HEURI INFOR 
AMDOA65160163HEILP/GOODM/ANALO 
AMDOA681903810CONN/RETRI ANSWE 
AMDOA68190363BAKER/NANCE/USE 


RETRI SYSTE 


Document 


EDITI INFOR 


RETRI GAME 


BETWE INFOR 


PROVI DOCUM 


SIMUL STUDY 


AMDOA68190173STARK/WHALE/CARSO/THOMP/GAF 


RETRI SYSTE 
AMDOA70210237KEITH/GENER EVALU 
AMDOA69200072SWETS/EFFEC INFOR 
AMDOA68190286MILLE/PSYCH INFOR 
AMDOA702100890TTEN/DEBON/METAS 
AMDOA68190090SMITH/LEVY /PHYSI 
RETRI | 
AMDOA68190387POLLO/MEASU COMPA 
AMDOA70210145WININ/DATA STRUC 
AMDOA64150210KLEMP/METHO COMPA 
SYSTE CRITI REVIE 
AMDOA68190404TREU /BROWS RETRI 
AMDOA70210004HUMPH/INFOR PEACE 
AMDOA64150014VERHO/BELZE/USE 
SYSTE 
AMDOA65160005DALE /DALE /CLUMP 
AMDOA68190375COTTR/EVALU COMPR 
SAFET INFOR CENTE 


AMDOA65160291GARVI/INFOR SURVE 


INFOR STORA 


RETRI METHO 


INFOR 


ORIEN METHO 


INFOR RETRI 


INFOR RETRI 


ANALY INFOR 


GAME 


META LANGU 


EXPER ASSOC 


SCIEN TECHN 


MODER LINGU 


DICTI 


RETRI 


INFOR 


DOCU M 


REPRE 


CLINI 


SY¥STE 


STORA 


INFOR 


DOCUM 


INFOR 
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O2503 AMDOA67180235BUCHA/HUTTO/ANALY AUTOM HANDL TECHN INFOR 


NUCLE SAFET INFOR 


0.502 AMDOA681902000CONN/QUEST CONCE INFOR NEED 

0.492 AMDOA70210095BROMB/ECCNO INFOR 

0.488 AMDOA67180010LIBBE/USE SECON ORDER DESCR DOCUM RETRI 
0.484 AMDOA70210385COOPE/DERIV DESIG EQUAT INFOR RETRI SYSTE 
0.479 AMDOA69200169FLANI/OPEN ENDED INFOR RETRI SYSTE INCLU 


SELEC DATA COLLE 
0.470 AMDOAO69200039LUNIN/ACADE INFOR CENTE 


0.468 AMDOA68190305THOMP/ORGAN INFOR 


It is interesting to note that the previous set of search 
output is a subset of this set of search output which is heavily 
dependent on the system parameters and therefore may contain 
some information unexpected by the user. Therefore, in order 
that the user will receive the most satisfactory search output, 


he needs to make a good judgment of the use of term weights. 


(II) The Generalized Optimum Iterative Feedback Algorithm 


eS 


By modifying sample search request one we obtain sample 


search request two which is given as follows: 
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QUE 60 70 
TEST OF THE GENERALIZED OPTIMUM ITERATIVE FEEDBACK ALGORITHM 


AND A BAKER 


OR C AMDOA 

OR«Ys68 

OR T COCUMENT 800 
OR T INFORMATION 500 
OR T STORAGE 600 
OR T RETRIEVAL 400 
END 


The threshold values and the weights assigned to terms are 
unchanged. It is found that only one document has relevance 


value above the threshold value 0.466 and is given as follows: 


Relevance Document 
Value 
0.534 AMDOA681903E3BAKER/NANCE/USE SIMUL STUDY INFOR STORA 


RETRI SYSTE 


As before, since the relevance value of the retrieved 
document lies in the interval [0.466, 0.733], the original query 
vector is modified. The new query vector includes the terms 
enclosed in brackets as (information, retrieval, storage, 
document, use, Simulation, study, system) with the corresponding 
new set of weights as (0.534, 0.441, O.5907 502674, 80007 “On7677 


0.598, 0.992). Since the term ‘use' is insignificant in CSDATA, 
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its significance value is zero. The value of om is found to be 


Ogiz6.0 The »following list of . documents and their relevance 


values are the result of the second search: 


Relevance Document 
Value 
0.739 AMDOA681903E6€3BAKER/NANCE/USE SIMUL STUDY INFOR STORA 


RETRI SYSTS 
02503 AMDOA69200203BAKER/OPTIM USER SEARC SEQUE IMPLI INFOR 


SYSTE OPERA 


0.483 AMDOAG9200027LESK /WORD WORD ASSOC DOCUM RETRI SYSTE 
0.483 AMDOA70210330CAGAN/HIGHL ASSCC DOCUM RETRI SYSTE 
0.477 AMDOA67180216FLOOD/ANALY QUEST ASKED MEDIC REFER RETRI 


OYSTE COMPA QUEST SYSTE THRNT 


0.476 AMDOA7O210237KEITH/GENER EVALU INFOR STORA RETRI SYSTE 

0.474 AMDOA6819Q120CARAS/COMPU SIMUL SMALL INFOR SYSTE 

0.469 AMDOA67180055BAKER/HAEFE/RECKH/FILM SYSTE DUPLI TERMA 
CARDS 


The highest relevance value of this search is 0.739 which 
is greater than the threshold value 0.733. Therefore, this set 
of documents is regarded as the final search output to the given 


Sample search request. 


Sample search request two is now modified so that the 
weights corresponding to (document, information, storage, 
retrieval) become (892, 822, 669, 630). AS before, these weights 


are obtained from the set of index terms and their significance 
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values. Unlike the case as given in (I), two searches are 
required to bring forth an optimum set of search output. The 


first set of documents is given as follows: 


Relevance Document 
Value 
2582 AMDOA681903€3BAKER/NANCE/USE SIMUL STUDY INFOR STORA 


RELTRI SYSTE 


0.510 AMDOA68190071VENEZ/STORA RETRI EDITI INFOR DICTI 


After query modification, the new request vector now 
includes the terms shown in brackets as (information, retrieval, 
storage, document, use, simulation, study, system) which 
corresponds to the weights given as (0.729, 0.672, 0.547, 0.414, 
Paoe O. 7608, O.598, 0.992)... The value of %. is found to be 


1 
0.161. Consequently, the final search output is: 


Relevance Document 
Value 
Onedd 7 AMDOAS81903E3BAKER/NANCE/SUSE SIMUL STUDY INFOR STORA 


RETRE SYSTE 
Deol AMDOA69200203BAKER/OPTIM USER SEARC SEQUE IMPLI INFOR 


SYSTH OPERA 


0.519 AMDOA70210237KEITH/GENER EVALU INFOR STORA RETRI SYSTE 
Oe 17 AMDOA68190387POLLO/MEASU CCMPA INFOR RETRI SYSTE 
0.498 AMDOA67180216FLOOD/ANALY QUEST ASKED MEDIC REFER RETRI 


SYSTE COMPA QUEST SYSTE TERMI 


0.494 AMDOA68190120CARAS/COMPU SIMUL SMALL INFOR pow dene bal 
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0.493 AMDOA64150210KLEMP/METHO COMPA ANALY INFOR STORA RETRI 


SYSTE CRITI REVIE 


0.481 AMDOA69200027LESK /WORD WORD ASSOC DOCUM RETRI SYSTE 
0.481 AMDOA70210330CAGAN/HIGHL ASSOC DOCUM RETR Slot e 

0.477 AMDOA66170026PARKE/USERS PLACE INFOR SYSTE 

0.477 AMDOA70210385COOPE/DERIV DESIG EQUAT INFOR RETRI SYSTE 
0.474 AMDOA69200169FLANI/OPEN ENDED INFOR RETRI SYSTE INCLU 


SELEC DATA COLLE 


0.466 AMDOA70210274BURCH/ROLE FEDER GOVER INFOR SYSTE EDUCA 


By Cccmparing the two sets of search outputs corresponding 
to the two search requests that use different sets of term 
weights, one can easily draw the same conclusions as discussed 
in (I). Besides being very dependent on the term weights 
assigned by the user, the search process also depends to certain 
extent on the estimated recall and precision values supplied by 
the user. The major difference between the performance of the 
two algorithms may be summarized by stating that the more 
specific the search request, the more selective the search 


output will tend to be. 


Note that the examples given above are one-parameter 
questions. In the case when more than one parameter is specified 
in one question, the parameters are treated as mutually 
exclusive; that is, each parameter is considered aS one query 
vector. The final search output then consists of ali the hits 


from the different query vectors. Further search examples are 
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included in Appendix E for reference. 


5.3 Conclusions 


See wee eae = eee 


It has been shown that both the optimum iterative feedback 
algorithm and the generalized optimun iterative feedback 
algorithm are capable of performing the retrieval of an optinun 
set of search output. One attraction of the algorithms is that 
no iterative search is necessary if the user's search request is 
already good enough. The only drawback is the additional search 
time required for iterative searches when a poor search request 
is encountered. However, in many cases, a maximum of two 
searches is probably sufficient. Therefore, the payoff of a poor 
search reguest in return for an optimum set of search output is 


after all not too discouraging. 


It is worthwhile to note that the automatic indexing 
algorithm developed in this thesis may be further investigated 
for the possibility of an automatic generation of a thesaurus 
which plays an important role in modern information storage and 
retrieval. By converting the index term list and the 
significance values into a property vector, an algorithm can be 
developed for automatic recognition of synonyms. Lastly, the 
automatic indexing algorithm may prove to be very useful in the 


Many applications of information handling systems. 
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APPENDIX A 


CCMPUTING SCIENCE DATA BASE OF JOURNAL ARTICLES 
APPROXIMATELY 7,000 ARTICLES ON FILE, FEBRUARY 1972 


AUTHOR NAMES AND TITLE WORDS TRUNCATED TO FIVE CHARACTERS 


CODEN 


ACJ 
AMDOA 
ASLPA 


ATCAA 
AURCA 
alga 

CACMA 
CBMRB 
CMPJA 
COBUA 
COMTB 
DATMN 
DIMNA 
ECECA 


ENCYA 
IBMJA 
IBMSA 


ICCBA 
IETTA 
IFCNA 
IFSRA 
IJCMA 


IJCOA . 


INPJB 
ITCOB 
JACOA 


JCHDA 
JCSSB 
JDOCA 
JIMBA 
JLAUA 
PAT 

PRITA 


AUTHOR NAMES ARE FOLLOWED BY / 
JOURNAL NAMES ARE REPRESENTED BY ASTM CODENS 


Journal Name (and period covered) 


Australian Computer Journal (1967-70). No ASTM coden. 
American Documentation (1964-70). 

Assoc. of Special Libraries and Informn. Bureau, Aslib 
Prec (1964-70) 

Automatica (1964-69) 

Automation and Remote Control (1968-69). 

Bit (1964-70). No ASTM coden. 

Communications of the ACM (1964-70). 

Computers and Biomedical Research (1968-70). 

Computer Journal (1964-70). 

Computer Bulletin (1965-70). 

Computing (1967-69). 

Datamation (1970). Non-ASTM coden used in error. 
Datamation (1967-69). 

Economics Comp. and Econ. Cybernatics Studies and 
Research (1968-70). 

Engineering Cybernatics (1969-70). 

IBM Journal of Research and Development (1969-70). 
International Business Machines, Systems Journal 
(1962-63). 

ICC Bulletin (1964-67). 

IEEE Transactions on Information Theory (1969-70). 
Information and Control (1964-70). 

Information Storage and Retrieval (1966-69). 
Informaticnal Journal of Computer Mathematics (1968). 
International Journal of Control (1969-70). 
Information Processing in Japan (1966-69). 

IEFE Transactions on Computers (1969-70). 
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Journal of the Association for Computing Machinery (1964 


= 70). 

Faby att of Chemical Documentation (1961-69). 

Journal of Computer and Systems Sciencés (1969). 
Journal of Documentation (1963,1965-68). 

SIAM Journal, Series B, Numerical Analysis (1969-70). 
Journal of Library Automation (1968-70). 

Pattern Recognition (1968-70). No ASTM coden. 
Problems of Information Transmission (1965-67). 
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APPENDIX B 


The following statistical association measures are commonly 
used. The individual source is included in square brackets to 


the right. The interpretation of symbols is as follows: 


Cij = the extent to which term, is associated with term.. 
ay = the frequency of co-occurrence of term, and i 
ft. = the frequency of occurrence of term. 
n = the total number of documents in the collection. 
= ; are ie 1 
ve Cij £47 at, fs,) [ J 
= { - - 2 - - : 16 
Bus oe ete EN Tiree ea n/2) ee SY £4) (a Sp L ] 
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SYSTE 
CONTR 
PROBL 
AUTOM 
LINEA 
ALGOR 
MULTI 
THEOR 
CODES 
SEQUE 
VARIA 
INTEG 
EXPER 
NONLI 
RECOG 
LIBRA 
STAND 
COMPA 
MODEL 
ALGOL 
CHANN 
ITERA 
UNIVE 
FORMA 
CIRCU 
STORA 
BINAR 
SERVI 
FEEDB 
PARAM 
DEF IN 
SYNTA 
SUBJE 
CONTE 
ELECT 
EFFEC 
SELEC 
PRODU 
LIMIT 
DECIS 
REDUC 
DERIV 
SERIA 
ORIEN 


APPENDIX C 


EV eAe + ieet 
a0 cL eee 


0.994 
05931 
0.904 
0.897 
0.887 
0.859 
0.848 
0.837 
0.816 
0.801 
0.789 
0.784 
OEP A fs) 
0.749 
0.728 
Oei22 
0.715 
0.704 
0.702 
0.694 
0.689 
0.686 
0.680 
0.678 
0.674 
0.669 
0.664 
0.659 
0.654 
0.639 
0.636 
0.633 
0.632 
0.627 
0.622 
0.618 
0.615 
0.611 
0.606 
0.602 
0.595 
0.591 
0.587 
0.583 


PROCE 
TRANS 
DATA 
OPTIM 
STRUC 
CHEMI 
APPLI 
OPERA 
APPRO 
EQUAT 
MATRI 
SIMUL 
STABI 
AMERI 
PROBA 
NUMBE 
ESTIM 
LOGIC 
PROPO 
ORDER 
PROPE 
RELAT 
DISTR 
FILTE 
REGUL 
STOCH 
ALGEB 
NETWO 
LARGE 
ABSTR 
MAGNE 
MEDIC 
DIMEN 
DOCUM 
POINT 
DISPL 
CHARA 
IBM 
ARITH 
CONTI 
FORM 
PERFO 
PLATE 
OUTPU 


ahs 


J 


0.980 
0.931 
0.904 
0.891 
0.872 
0.858 
0.846 
0.833 
0.809 
0.795 
0.787 
0.784 
0.768 
0.749 
0.727 
0.720 
0.713 
0.703 
0.700 
0.693 
0.686 
0.685 
0.680 
0.677 
0.673 
0.668 
0.663 
0.658 
0.648 
0.637 
0.636 
0.632 
0.631 
0.626 
0.620 
0.647 
0.614 
0.611 
0.605 
0.601 
0.594 
0.591 
0.585 
0.563 


TECHN 
METHO 
INTER 
FUNCT 
PROGR 
INDEX 
ERROR 
CLASS 
CORRE 
DIGIT 
LITER 
DIEPE 
STATE 
SIGNA 
NUMER 
FINIT 
STATI 
DYNAM 
FORTR 
ORGAN 
DETER 
MEMOR 
NOISE 
BOOLE 
MINIM 
FREE 

INPUT 
CODE 

SCHEM 
DEVEL 
CONPL 
POLYN 
FORMU 
SYNTH 
BOUND 
COMPI 
SINGL 
TABLE 
INVER 
TAPE 

VALUE 
ADAPT 
MODUL 
PATTE 


0.941 
0.929 
0.3899 
0.890 
0.866 
0.856 
0.841 
0.832 
0.802 
0.795 
0.786 
0.781 
0.756 
0.736 
0.723 
0.719 
0.707 
0.703 
0.697 
0.693 
0.686 
0.684 
0.679 
0.675 
0.673 
0.665 
0.662 
0.658 
0.647 
0.637 
0.635 
0.632 
0.629 
0.626 
0.620 
0.607 
0.614 
0.609 
0.605 
0.597 
0.593 
0.590 
0.585 
0.583 
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TESTI 
BUSIN 
SWITC 
SOCIA 
DESCR 
NORMA 
REPRE 
FACTO 
CATAL 
NON 
USER 
IDENT 
MANIP 
SYMEFO 
CENTE 
SYMME 
SURVE 
PHYSI 
BASE 
HARMO 
PREFI 
COLLA 
PL1 
CARDS 
REFER 
MONIT 
EDUCA 
REMOT 
PLANN 
LOCAL 
ASSIG 
PUBLI 
SENSI 
SOLVI 
NATUR 
PULSE 
PRINC 
HIGH 
SELF 
FACIL 
ALLOC 
CASE 
INDEP 
QUALI 
FLOWC 
ADDIT 
REALI 
WRITI 
360 
RESOU 
REGIO 
PSEUD 
DELAY 
DOMAT 
COBOL 


0.583 
0.582 
0.576 
0.570 
0.566 
0.563 
0.560 
0.995 
0.551 
0.546 
0.545 
0.542 
0.541 
0.538 
0.531 
0.528 
ARS Ae 
0'. 523 
0.519 
0.518 
0.515 
0.515 
0.515 
0.514 
O23 72 
0.506 
0.501 
0.496 


0.489 . 


0.485 
0.480 
0.478 
0.476 
0.470 
0.468 
0.464 
0.460 
0.456 
0.453 
0.451 
0.450 
0.444 
0.442 
0.438 
0.434 
0.432 
0.430 
0.428 
0.422 
0.417 
0.413 
0.411 
0.407 
0.405 
0.403 


EIGEN 
PHASE 
ECONO 
ACTIV 
ASYMP 
ABSOL 
MATHE 
SERIE 
SMALL 
SPACE 
TYPE 
EQUIV 
INDUS 
SYNCH 
PAPER 
RAPID 
SsTtUDI 
FIELD 
QUADR 
PARSE 
REDUN 
FREDH 
KONVE 
PARAL 
TERRE 
CITAT 
SET 
CNLIN 
PERIO 
RESPO 
QUEUE 
PIECE 
REGIS 
ROOTS 
LOOP 
ASLIB 
PERMU 
CAPAC 
CIAGN 
IMPLE 
RELEV 
FAST 
LINKS 
MARKO 
CHART 
FUNDA 
ASA 
PREDI 
NETS 
DECOM 
CONDU 
REVIE 
RIGHT 
FINDI 
BIBLI 


0.582 
0.581 
0.572 
0.568 
0.564 
0.562 
0.560 
0.556 
0.551 
0.546 
0.544 
0.542 
0.540 
0.538 
0.530 
0.528 
O2525 
0.522 
0.518 
0.518 
0.515 
0.515 
0.512 
0.514 
O35 12 
0.505 
0.501 
0.496 
0.488 
0.482 
0.480 
0.478 
0.474 
0.469 
0.466 
0.464 
0.458 
0.454 
0.453 
0.451 
0.448 
0.443 
0.442 
0.436 
0.433 
0.432 
0.430 
0.428 
0.422 
0.417 
0.412 
0.410 
0.405 
0.405 
0.401 


PRINT 
ENGIN 
BASED 
PRESE 
oP EeT 
COLLE 
COMEO 
THRES 
BLOCK 
CONCE 
ANALO 
SOFTW 
REPOR 
CYcLrt 
GROUP 
RESUL 
RUNGE 
DEVIC 
DEPAR 
RECTA 
FACT 

INSUR 
CARD 

COMME 
DETEC 
ADMIN 
PURPO 
QUANT 
PACKA 
EXTRA 
CODIN 
ASSOC 
CONSI 
HEURTL 
SPEED 
HANDL 
COEFF 
DIVIS 
SUCCE 
TERM 

INVAR 
POSIT 
DECOD 
PRECIL 
TOWAR 
REQUI 
BOCKS 
DUAL 

DEPEN 
PREPA 
WEIGH 
VECTO 
RANK 

CURRI 
ARTIC 


0.562 
0.577 
0.572 
0.567 
0.564 
0.561 
0.560 
0.554 
0.549 
0.546 
0.543 
0.542 
0.538 
0.538 
0.530 
Gites 4) 
0.524 
0.521 
0.518 
0.518 
0.515 
0.915 
0.514 
0.514 
O23 10 
0.505 
0.500 
0.495 
0.487 
0.482 
0.478 
0.477 
0.473 
0.469 
0.466 
0.463 
0.456 
0.454 
0.451 
0.451 
0.447 
0.442 
0.439 
0.436 
0.433 
0.431 
0.428 
0.426 
0.418 
0.415 
0.411 
0.408 
0.405 
0.404 
0.401 


HYBRI 
PRACT 
TE 
SQUAR 
ACCES 
PATEN 
TaoL 
CENTR 
ASPEC 
IMPRO 
AMPLI 
CRITE 
GRAMM 
NOTE 
CONFE 
SEPAR 
NATIO 
ALPHA 
FILM 
BRIEF 
ARGUM 
DEVIA 
LAW 
VIEW 
PUNCH 
TEACH 
AWARE 
MARKE 
COMPR 
SECON 
HAND 
SETS 
RECUR 
GAMES 
GAUSS 
PICT 
FOURL 
QUAST 
ALTER 
PROFE 
STEP 
UTILI 
INTRO 
STRAT 
RELAY 
EXPAN 
EQUIP 
JOURN 
SHIFT 
CHEBY 
TITLE 
ACCOU 
PASS 
List 
SCHOO 


0.282 
0.577 
0.571 
Red ow / 
0.563 
0.561 
0.560 
06552 
0.548 
0.545 
0.542 
0.542 
6.538 
0.537 
0.528 
Oe526 
0.523 
0.520 
0.518 
0.515 
0.515 
0.515 
0.514 
0.512 
0.509 
0.504 
0.497 
0.494 
0.485 
0.480 
0.478 
0.477 
0.472 
0.469 
0.465 
0.461 
0.456 
0.454 
0.451 
0.450 
0.444 
0.442 
0.439 
0.434 
0.432 
0.431 
0.428 
0.426 
0.418 
0.414 
0.411 
0.408 
0.405 
0.404 
0.399 
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a See 


EXTEN 
DIAGR 
DISCO 
WATER 
NONPA 
ANNOU 
PAGED 
EXTRE 
SCALE 
ORTHO 
LENGT 
PROBS 
CEP 

PUEFPT 
RIGID 
CHOMS 
UNSTE 
SEBPP 
BURT 
SIMON 
APPRA 
REPLA 
HOT 

GAIN 

FIRM 
TROUB 
LATEN 
UNSUP 
ORANT 
TRIAN 
SUBOP 
SYLLA 
SIZE 
MARKU 
OCCUR 
ORBIT 
PACKE 
MERGI 
SCHOL 
AMEND 
MERCU 
CHICA 
COLOU 
APPRE 
BROMB 
SPATI 
BINDI 
VISIO 
DUPLE 
INCOR 
ATTEN 
DIS AG 
TOWER 
RESIS 
SARDI 


0.399 
G5 70 
0.395 
2595 
04395 
Ov 395 
Oe395 
0.394 
0.392 
0.386 
0.383 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
ES Be) 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
Je 318 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
Ong 13 
0.378 
0.378 
0.378 
0.378 


MOTIO 
LEVEL 
MAJOR 
JAPAN 
KDF9 

EXECU 
FIXED 
PHRAS 
Sstisi 
CHEIC 
SUBMI 
COURT 
RECEP 
CALL 

POLES 
POLIT 
PREFE 
LOCUS 
RCA 

PROLE 
RAY 

MARK 

RC 400 
PREVI 
PRicl 
DORN 

PEACE 
PAGE 

ONTAR 
ELLIO 
PARIT 
PROCR 
TRACK 
ACOUT 
ACMCP 
AGENT 
PANTA 
ANTIC 
FILE 

FCC 

PANEL 
SHOP 

CONFR 
UNSTA 
LANCZ 
MARKS 
MASTE 
MANUS 
SIDES 
GIER 

LANGE 
AUTON 
KOUSI 
HILL 

LENSE 


0.398 
O..398 
0.395 
0.395 
O6g95 
0.395 
Os395 
0.394 
0.388 
G«3305 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.318 
0.378 
0.378 
0.378 
Osai8 
0.378 
0.378 
0.378 
0.378 
0.378 
Vests 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.376 
0.378 
0.378 
0.378 
0.378 
RE es! 
Oe373 
0.378 
0.378 


Cost 
PLANE 
SORT 
ACADE 
KEYWO 
BOARD 
GROWT 
SYMBO 
REMAR 
ELIMI 
BELL 
BORDE 
PROMO 
JORDA 
ALLIE 
ERKRAT 
BOND 
AERON 
SHOCK 
JACKS 
TRI 
HOBBS 
SLEEPS 
HELIC 
SESSI 
P@STi 
RENAM 
PLAN 
SYNAP 
MAGNU 
TELEC 
NONCO 
PARTY 
STAR 
EXCLU 
BLTS 
OHIO 
CANCE 
TRIAL 
VIROL 
ATTEM 
TREAT 
LOAD 
LONDO 
APL 
BASSA 
VOYSE 
WILSO 
2314 
LABS 
UNVOL 
AUDIO 
ANCMA 
ROBIN 
DUALI 


0.398 
0.396 
0.395 
0.395 
0.395 
0.395 
0-395 
C2392 
0.387 
0.385 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.376 
02378 
0.376 
Das 1.9 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
Qastd 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
Osi 46 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 


PERSO 
INQUI 
SUM 
SERVO 
TUTOR 
DISTO 
SCHED 
SUBRE 
MEETI 
PRIOR 
BACKW 
POLLU 
MONOI 
DREDG 
CULTU 
BIVAR 
POSTA 
TANK 
DECOC 
ens 
RACHF 
DEMON 
DISCI 
DOMIN 
REGEL 
SLT 
ECMA 
CANAD 
PEREKA 
SIZED 
ENDED 
BANDS 
CORNE 
SHEFF 
MEANI 
ROMAN 
PROGA 
ORDNU 
I/o 
OPTIO 
DIGES 
feck 
CARTO 
LOSSE 
SHELL 
JUMP 
BAIRS 
ROSEN 
KOSHE 
UNCON 
BROWN 
LAYOU 
FIDEL 
KNUTH 
SEMIG 


0.398 
0.396 
05395 
0.395 
0.395 
0.395 
0.395 
RS be Be 
0.387 
0.384 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
OS349 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.373 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
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LOREV 


ROYAL 
FERRI 
TANDE 
vod Pe 
HAMMI 
ICL 

MODAL 
IMPRE 
DECAD 
LONIN 
JENKI 
INDIR 
LAGS 
HARD 
HIRSC 
HADAM 
REGIM 
LONGE 
FREED 
PERT 
HETER 
DOUGL 
DEEP 
MINER 
DRUGS 
EMPHA 
BETA 
ERASL 
DECID 
DRIFT 
DATAF 
CLARI 
SIGNE 
CBAC 

LAGRA 
SINGU 
WORKI 
RACHE 
KORZH 
EVIDE 
CHURC 
AREAS 
SIMSC 
STAGE 
SCALI 
TREE 

POSSI 
KUTTA 
REGAR 
FILLI 


0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
Coats 
0.378 
0.378 
0.378 
Pet Re. 
0.378 
0.378 
0.378 
0.378 
Ore We 
0.378 
0.378 
0.378 
0.378 
0.377 
0.370 
0.367 
0.867 
0.367 
0.367 
0.367 
0.367 
2361 
0.367 
0.361 
0.356 
0.351 
0.343 
0.342 


SIDE 
SUBSE 
SLCW 
CLENS 
NONDE 
MILNE 
MORPH 
METAB 
MERGE 
INTEL 
IRRED 
KEIO 
ISSUE 
INACC 
ILL 
HIGHE 
HOHER 
PAREN 
RISK 
HIDDE 
PENTO 
FLOWS 
FLOWER 
GLYCO 
EINSC 
EDELM 
MAPS 
DEFLE 
DESst 
DATAN 
DEFER 
MOORE 
NORM 
MAEHL 
MICHT 
FLCW 
CAVIT 
MCVIN 
IMPED 
BRAIN 
BLEND 
TOLER 
VERTE 
RETRO 
DISSE 
ENCOD 
TIMET 
EVENT 
PEOPL 
INSTR 


0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.376 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
Pe WT) 
0.378 
0.378 
0.378 
0.378 
0.378 
02378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0=375 
0.367 
0.367 
0.367 
0.367 
0.367 
0.367 
0.367 
0.367 
0.367 
0.361 
0.3355 
0.345 
0.343 
02339 


GROSS 
BANKI 
NET 
NONSTI 
NONRE 
FLUX 
MODES 
MATEM 
MERSE 
PREVE 
IONES 
POLYP 
JONES 
RAPHS 
Let 
REED 
EASTM 
ESSO 
HASH 
HABIT 
FRANK 
FRANC 
GAAS 
FLORE 
LOADI 
EXCHA 
FLEXT 
AREA 
PRACN 
LAYMA 
FIGUR 
DEPOS 
DONNE 
DATAM 
COSMI 
GUIDA 
RESER 
ASYMM 
DEMOD 
CONFO 
AUTO 
TELEM 
VIEWP 
SORTE 
THESA 
LAPLA 
EXPRE 
NUCLE 
DECEN 


0.378 
0.378 
0.378 
0.378 
OPCW Es) 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
O.578 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.378 
0.374 
0.367 
0.367 
0.367 
0.367 
0.367 
0.367 
0.367 
0.367 
0.367 
0.360 
0.355 
0.345 
0.343 
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by usin 


SISTE 
METHO 
PROBL 
ANALY 
LINEA 
Fine 
MULTI 
ERROR 
CODES 
EVALU 
MATRI 
CONST 
EXPER 
AMERI 
RECOG 
BINLIT 
SCIEN 
LOGIC 
FORTR 
GRAPH 
ITERA 
CHANN 
NOISE 
UNIVE 
MICRO 
EFFIC 
STOCH 
DISCR 
SCHEM 
CONDI 
DEVEL 
SYNTA 
MEDIC 
DOCUM 
DIREC 
BOUND 
COMPI 
BLT 
LIMIT 
SPECI 
PERFO 
SOURC 
PRINT 
FREQU 
MECHA 
ia 
SQUAR 
ACTIV 


g 


COMPU 
TECHN 
INFOR 
INTER 
LANGU 
CHEMI 
RETRI 
MACHT 
SOLUT 
EQUAT 
VARIA 
DLPFE 
DESIG 
STATE 
PROBA 
LIBRA 
NUMER 
PROPO 
COMPA 
ORDER 
ORGAN 
CALCU 
FILTE 
FORMA 
MINIM 
FREE 
SERVI 
BINAR 
CODE 

PARAM 
CERTA 
POLYN 
ABSTR 
ELECT 
FORMU 
DISPL 
SYNTH 
TABLE 
IBM 

REDUC 
VALUE 
DERIV 
HY BRI 
MODUL 
EIGEN 
PRACT 
FILE 
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ENGIN 
BLOCK 
CRITE 
ens? 
NORMA 
GRAMM 
IMPRO 
INDUS 
RUNGE 
IDENT 
DEPAR 
RESUL 
PREFI 
PL1 
ALPHA 
BASE 
REPOR 
LAW 
CITAT 
SET 
STUDI 
COMME 
ONLIN 
PIECE 
HAND 
RESEO 
SENSI 
ASLIB 
GAMES 
ROOTS 
LOOP 
SUCCE 
HIGH 
INVAR 
INT RO 
UTILI 
SOLVI 
MARKO 
TOWAR 
ADDIT 
REQUI 
DUAL 
EQUIP 
SHIFT 
DECCM 
FOURI 
ACCOU 
DOMAI 
DELAY 
CURRI 
PSEUD 
INQUI 
SUM 
SERVO 
TUTOR 


2597 
0.594 
0.590 
0.587 
0.582 
0.576 
0.571 
0.569 
0.567 
0.564 
0.564 
0.563 
0.562 
0.562 
0.561 
0.559 
0.553 
0.549 
0.548 
0.543 
0.540 
0.537 
0.538 
0.527 
O2522 
0.520 
05513 
0.506 
0.502 
0.499 
0.493 
0.490 
0.488 
0.486 
0.482 
0.480 
0.479 
0.474 
0.470 
0.469 
0.466 
0.464 
0.457 
0.457 
0.451 
0.448 
0.445 
0.442 
0.441 
0.437 
0.433 
0.432 
0.430 
0.430 
0.430 


SOURC 
SQUAR 
CATAL 
SERIE 
FACTO 
MANIP 
SYNCH 
SEPAR 
SPACE 
EQUIV 
HARMO 
INSUR 
BRIEF 
ARGUM 
CYCLI 
SYMPO 
PUNCH 
CARD 
VIEW 
AWARE 
SURVE 
CONCE 
ANALO 
MONIT 
QUEUE 
EXTRA 
LOCAL 
PUBLI 
PULSE 
REURI 
IMPLE 
HANDL 
PERIO 
DIAGN 
CASE 
SELF 
PAPER 
FLOWC 
CHART 
FUNDA 
EXPAN 
360 
WRITL 
REALI 
PREPA 
RESOU 
CHEBY 
VECTO 
PHYSL 
COBOL 
PERSO 
SCHED 
WATER 
NONPA 
ANNOU 


0.597 
0.591 
0.590 
0.586 
0.580 
0.575 
0.571 
0.569 
0.567 
0.564 
0.564 
Vep6zZ 
0.562 
0.562 
0.560 
0.558 
0.553 
0.549 
0.548 
0.543 
0.539 
0.536 
0.532 
0.526 
PS a 
0.520 
USot3s 
0.505 
0.504 
0.499 
0.492 
0.490 
0.488 
0.486 
0.482 
0.480 
0.477 
0.473 
0.470 
0.468 
0.466 
0.461 
0.457 
0.456 
0.450 
0.447 
0.443 
0.442 
0.441 
0.437 
0.432 
0.431 
0.430 
0.430 
0.430 


THRES 
DESCR 
ACTIV 
SPECT 
ASPEC 
PATEN 
NOTE 

SELEC 
FIELD 
RECTA 
PARSE 
FREDH 
DEVIA 
KONVE 
GROUP 
TERMI 
NATIO 
CARDS 
CENTE 
PARAL 
TEACH 
QUANT 
DETEC 
PACKA 
ASSIG 
SETS 

ASSOC 
RECUR 
REGIS 
GAUSS 
PROFE 
QUASI 
RELEV 
SPEED 
TERM 

LINKS 
PRINC 
QUALI 
FRST 

STRAT 
DECOD 
NETS 

SECON 
PREDI 
REGIO 
WEIGH 
RANK 

RIGHT 
List 

ARTIC 
Cost 

MAJOR 
JAPAN 
KDF9 

EXECU 


0.595 
0.591 
0.588 
0.584 
0.580 
0.573 
0.571 
0.568 
0.565 
0.564 
0.564 
0.562 
0.562 
0.562 
0.560 
0.554 
0.549 
0.549 
0.544 
0.542 
02538 
0.536 
0.531 
0.524 
0.521 
0.519 
Oeo 12 
0.505 
0.501 
0.498 
0.491 
0.488 
0.487 
0.482 
0.482 
0.480 
0.475 
0.472 
0.470 
0.468 
0.466 
0.459 
0.457 
0.453 
0.448 
0.445 
0.442 
0.441 
0.440 
0.436 
0.432 
0.430 
0.430 
0.430 
0.430 


COMPO 
PRESE 
MATHE 
SOFTW 
USER 
SMALL 
RAPID 
SYMME 
AMPLI 
FILM 
CONFE 
REDUN 
FACT 
COLLA 
IY2e 
REFER 
ADMIN 
NON 
DEVIC 
QUADR 
REMOT 
PURPO 
MARKE 
PLANN 
COMPR 
CODIN 
CONSTI 
NATUR 
PICT 
PERMU 
ALTER 
DIVES 
ALLOC 
POSIT 
CAPAC 
INDEP 
SLES 
PRECT 
RELAY 
ASA 
FACIL 
BOOKS 
DEPEN 
JOURN 
COEFF 
CONDU 
PASS 
el BR 
FINDI 
MOTIO 
PLANE 
SORT 
ACADE 
KEYWO 
BOARD 


0.595 
0.590 
0.587 
0.582 
0.578 
Gag i2 
0.570 
0.567 
0.565 
0.564 
0.563 
0.562 
0.562 
0.562 
02859 
0.554 
0.549 
0.549 
0.543 
0.541 
0.537 
Oe S35 
0.529 
0.523 
0.521 
0.516 
0.512 
0.502 
0.500 
0.496 
0.490 
0.488 
0.486 
0.482 
0.482 
0.480 
0.475 
0.471 
0.469 
0.468 
0.464 
0.457 
0.457 
0.452 
0.448 
0.445 
0.442 
0.441 
0.439 
0.434 
0.432 
0.430 
0.430 
0.430 
0.430 
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DISTO 
DISCO 
PHRAS 
REVIE 
MEETI 
ELIMI 
LENGT 
PROBS 
CEP 

PUBET 
RIGID 
CHOMS 
UNSTE 
STEPP 
BIRTH 
SIMON 
APPRA 
REPLA 
HOT 

GAIN 
FIRM 
TROUB 
LATEN 
UNSUP 
ORANI 
TRIAN 
SUBOP 
SYLLA 
SIZE 
MARKU 
OCCUR 
ORBIT 
PACKE 
MERGI 
SCHCL 
AMEND 
MERCU 
CHICA 
COLOU 
APPRE 
BROMB 
SPATI 
BINDI 
VISIO 
DUPLE 
INCOR 
ATTEN 
DISAG 
TOWER 
KESLS 
SARDI 
ROY AL 
FERRI 
TANDE 
STAT1 


0.430 
0.430 
0.428 
0.426 
0.421 
0.419 
0.413 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
On 4 tz 
0.412 


PAGED 
BIBLI 
EXTRE 
SYMBO 
EXTEN 
PRIOR 
SUBMIT 
COURT 
RECEP 
CALL 

POLES 
POLIT 
PREFE 
LOCUS 
RCA 

PROLE 
RAY 

MARK 
RC400 
PREVI 
PRICT 
DORN 

PEACE 
PAGE 

CNTAR 
ELLIO 
PARIT 
PROCR 
TRACK 
ACOUT 
ACMCP 
AGENT 
PANTA 
ANTIC 
Ege oy 

CC 

PANEL 
SHOP 

CONFR 
UNSTA 
LANCZ 
MARKS 
MASTE 
MANUS 
SIDES 
GIER 

LANGE 
AUTON 
HOUSIL 
HILL 

LENSE 
STOPP 
NORMS 
NETHE 
META 


0.430 
0.429 
0.428 
0.424 
0.429 
0.417 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 


FIXED 
DIAGR 
SCALE 
Se Bae 
LEVEL 
ORTHO 
BELL 
BORDE 
PRCMO 
JORDA 
ALLIE 
BRRAT 
BOND 
AERON 
SHOCK 
JACKS 
TRI 
HOBBS 
STEES 
HELIC 
op pope 
POSTI 
RENAM 
PLAN 
SYNAP 
MAGNU 
TELEC 
NONCO 
PARTY 
STAR 
EXCLU 
Bits 
OHIC 
CANCE 
TRIAL 
VIRCL 
ATTEM 
TREAT 
LOAD 
LONDO 
APL 
BASSA 
VOYSE 
WILSO 
2314 
LABS 
UNVOL 
AUDIO 
ANCMA 
ROBIN 
DUALI 
SIDE 
SUBSE 
SLOW 
CLENS 


0.430 
0.428 
0.428 
0.423 
0.428 
0.417 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 


GROWT 
SC HOO 
SUBRE 
REMAR 
CHOC 
EDUCA 
BACKW 
POLLU 
MONOL 
DREDG 
CULTU 
BIVAR 
POSTA 
TANK 
DECOC 
CDS 
RACHF 
DEMON 
DISCT 
DOMIN 
REGEL 
SLT 
ECMA 
CANAD 
PEEKA 
SIZED 
ENDED 
BANDS 
CORNE 
SHEFF 
MEANI 
ROMAN 
PROGA 
ORDNU 
1/0 
OPTIO 
DIGES 
LOCI 
CARTO 
LOSSE 
SHELL 
JUMP 
BALRS 
ROSEN 
KOSHE 
UNCON 
BROWN 
LAYOU 
FIDEL 
KNUTH 
SEMIG 
GROSS 
BANKI 
NET 
NONSI 


0.430 
0.428 
0.427 
0.422 
0.419 
0.415 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0-412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
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iP APA 
Tid as caveo 
fast WEG 
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HAMMI 
ICL 
MODAL 
IMPRE 
DECAD 
IONIN 
JENKI 
INDIR 
LAGS 
HARD 
HIRSC 
HADAM 
REGIM 
LONGE 
FREED 
PERT 
HETER 
DOUGL 
DEEP 
MINER 
DRUGS 
EMPHA 
BETA 
FEASI 
DECID 
DRIFT 
DATAF 
CLARI 
SIGNE 
CBAC 
LAGRA 
FLOW 
STORE 
ASYMM 
DEMOD 
CONFO 
AUTO 
TELEM 
VIEWP 
SORTE 
THESA 
SINGU 
TINEL 
KUTTA 
POLE 
NUCLE 
ANSWE 
HEAT 


0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.408 
0.401 
0.401 
0.401 
0.401 
0.401 
0.401 
0.401 
0.401 
0.398 
0.390 
0.386 
0.384 
0.374 
0.373 
0.362 
0.360 


TOTAL 


0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.407 
0.401 
0.401 
0.401 
0.401 
0.401 
0.401 
0.401 
0.401 
Pe he 
0.389 
0.355 
0.381 
0.374 
POS WF 
0.362 
0.360 


NONDE 
MILNE 
MORPH 
METAB 
MERGE 
INTEL 
IRRED 
KEIO 

ISSUE 
INACC 
ILL 

HIGHE 
HOHER 
PAREN 
RISK 

HIDDE 
PENTO 
FLOWS 
FLOWER 
GLYCO 
EINSC 
EDELM 
MAPS 

DEFLE 
DISS L 
DATAN 
DEFER 
MOORE 
NORM 

MAEHL 
MECH 
WORKI 
RESER 
GRAHA 
TAU 

BENDI 
UNFOR 
WESCO 
SUBCO 
ROTAT 
MODIF 
POSSI 
ENCOD 
EVENT 
DECEN 
INSTR 
HARDW 
MEAN 


0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.407 
0.401 
0.401 
0.401 
0.401 
0.401 
0.401 
0.401 
0.401 
0.395 
0.389 
0.385 
0.377 
0.374 
0.366 
0.361 
0.360 


NONRE 
FLUX 
MODES 
MATEM 
MERSE 
PREVE 
TONES 
POLYP 
JONES 
RAPHS 
EET 
REED 
EASTM 
ESSO 
HASH 
HABIT 
FRANK 
FRANC 
GAAS 
FLORE 
LOADI 
EXCHA 
FLEXI 
AREA 
PRACN 
LAYMA 
FiGUR 
DEPOS 
DONNE 
DATAM 
COSMI 
CAVIT 
MOVIN 
IMPED 
BRAIN 
BLEND 
TOLER 
VERTE 
RETRO 
TREE 
IMPUL 
LAPLA 
EXPRE 
REGAR 
FILLI 
UNION 
COMMA 


0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.412 
0.4172 
0.412 
0.401 
0.401 
0.401 
0.401 
0.401 
0.401 
0.401 
0.401 
0.398 
TPE ee. 
0.387 
0.384 
0.374 
0.374 
0.365 
0.361 
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COMPU 
TECHN 
PROBL 
OPTIM 
LINEA 
ALGOR 
TIO’ 
APPLI 
MACHI 
EQUAT 
VARIA 
INTEG 
EXPER 
RANDO 
RECOG 
FINIT 
NUMER 
ES Lie 
COMPA 
ELEME 
RELAT 
CALCU 
REGUL 
EFFIC 
ORGAN 
INPUT 
DISCR 
BINAR 
NETWO 
MAGNE 
SCHEM 
DIMEN 
COMFL 
SIMPL 
SYNTA 
DISPL 
INVES 
BIT 

CHARA 
ARITH 
DERIV 
SLUDY 
SELEC 
HYBRI 
SOURC 
EIGEN 
SOCIA 
FILE 


0.994 
0.927 
0.900 
0.882 
0.871 
0.844 
0.829 
0.815 
0.801 
0.786 
0.776 
0.773 
0.766 
0.743 
0.728 
0.716 
0.709 
0.699 
0.698 
0.693 
0. 686 
0.682 
0.678 
0.675 
0.672 
0.669 
0.663 
0.657 
0.649 
0.643 
0.639 
0.634 
0.633 
0.629 
0.624 
0.620 
0.614 
0.611 
0.610 
0.605 
0.600 
0.598 
0.593 
0.590 
0.589 
0.587 
0.580 
0.576 


APPENDIX D 


SEARC 
MEMOR 
UNIVE 
BOOLE 
FILTE 
ITERA 
STORA 
NOTAT 
LARGE 
RESEA 
FEEDB 
ABSTR 
DEVEL 
ELECT 
SHARI 
ORDIN 
BOUND 
TABLE 
IBM 

MANAG 
CONTI 
ADAPT 
PERFO 
MECHA 
MODUL 
TeOeL 
PHASE 
ECONO 
ASYMP 


0.772 


CONTR 
GENER 
AUTOM 
INTER 
PROGR 
MULTI 
ERROR 
CODES 
EVALU 
CORRE 
LITER 
SIMUL 
DESIG 
STATE 
PROBA 
NUMBE 
LOGIC 
ALGOL 
GRAPH 
DETER 
SAMPL 
PROPE 
FORMA 
NOISE 
COORD 
SsrOCH 
SERVI 
CONDI 
CERTA 
SUBJE 
PARAM 
DEFIN 
FORMU 
CURRE 
CONTE 
SYNTH 
SP ECeL 
LIMIT 
PRODU 
SERIA 
FORM 
DECIS 
PATTE 
BUSIN 
FREQU 
PRACT 
BASED 
COLLE 
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0.931 
0.909 
0.882 
0.874 
0.848 
0.833 
0.816 
0.809 
0.790 
0.783 
0.773 
0.767 
0.748 
0.733 
Calta 
0.709 
0.703 
0.698 
0.694 
0.687 
0.683 
0.680 
0.676 
0.674 
0.669 
0.663 
0.661 
0.651 
0.644 
0.640 
0.637 
0.633 
0.630 
0.625 
0.621 
0.614 
0.612 
0.611 
0.605 
0.600 
0.598 
0.495 
0.591 
0.590 
0.588 
0.581 
0.576 
0.571 
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SQUAR 
DESCR 
LEST 
MATHE 
BLOCK 
USER 
SPACE 
EQUIV 
NON 
SYMPO 
RAPID 
GROUP 
CENTE 
FILM 
STUDI 
COLLA 
REDUN 
NATIO 
TERMI 
QUATR 
PARAL 
DETEC 
PURPO 
ONLIN 
COMPR 
RESPO 
HAND 
SENSI 
CONSTI 
ROOTS 
PULSE 
PICTU 
PERMU 
CAPAC 
ALTER 
ALLOC 
FACIL 
STEP 
COEFF 
MARKO 
STRAT 
AUDIT 
REQUI 
EQUIP 
JOURN 
DECCM 
CHEBY 
VECTO 
PSEUD 
FINDI 
ARTIC 
PERSO 
DIAGR 
SCHED 
DISCO 


0.571 
0.567 
0.565 
0.561 
0.557 
0.551 
0.548 
0.545 
0.542 
0.540 
0.536 
C2533 
0.529 
C2526 
@.425 
0.525 
0.525 
0.524 
0.520 
0.519 
Pee 
0.510 
0.507 
0.501 
0.493 
0.489 
0.486 
0.484 
0.480 
0.475 
0.471 
0.468 
0.465 
0.460 
0.459 
C.457 
0.453 
0.449 
0.447 
0.444 
0.441 
0.439 
0.438 
0.432 
0.430 
0.424 
0.419 
0.415 
0.412 
0.411 
0.408 
0.404 
0.404 
0.402 
0.402 


ACCES 
REPRE 
CENTR 
FACTO 
CATAL 
CRITE 
MANIP 
TYPE 
CYCLI 
SEPAR 
ANALO 
RUNGE 
ALPHA 
HARMO 
INSUR 
PREFI 
FREDH 
CEVIC 
CARDS 
VIEW 
ADMIN 
cyte 
AWARE 
MARKE 
PHYSI 
EXTRA 
ASSOC 
CODIN 
RECUR 
NATUR 
GAUSS 
EDUCA 
PRINC 
SELF 
CIAGN 
TERM 
CASE 
INDEP 
FOURI 
PRECI 
TOWAR 
FUNDA 
PREDI 
BOOKS 
NETS 
RESOU 
WEIGH 
DELAY 
RIGHT 
CURRI 
COBOL 
LEVEL 
MOTIO 
PAGED 
GROWT 


0.570 
0.567 
0.563 
0.560 
C605 
0.551 
0.547 
0.544 
0.542 
O63 
0.534 
0.532 
0.528 
0.526 
0.3525 
Oso29 
@.525 
0.524 
Coz 0 
0.518 
0.514 
0.509 
0.507 
0.500 
0.491 
0.489 
0.485 
0.484 
0.477 
0.474 
0.470 
0.468 
0.462 
0.460 
0.459 
0.457 
0.451 
0.449 
0.447 
0.443 
0.440 
0.439 
0.433 
0.432 
0.429 
0.423 
0.418 
0.414 
GO. 412 
0.410 
0.408 
0.404 
0.404 
0.402 
0.402 


ACTIV 
SPECT 
NORMA 
PATEN 
SMALL 
IMERO 
AMPLI 
GRAMM 
NOTE 

REPOR 
SY MME 
RESUL 
BASE 

PARSE 
BRIEF 
ARGUM 
PACT 
SURVE 
CARD 

PUNCH 
CLITIAT 
PAPER 
QUANT 
PLANN 
PIECE 
ASSIG 
SETS 

REGIS 
GAMES 
ASLIB 
HANDL 
SPEED 
HIGH 

QUASI 
SUCCE 
RELEV 
POG@Si? 
UTILI 
FAST 

DECOD 
RELAY 
ASA 

DUAL 

WRITI 
DEPEN 
PREPA 
CONDU 
ACCOU 
PASS 

List 

BIBLI 
SCHOO 
PLANE 
KDF9 
KEYWO 


0.569 
0.567 
0.563 
0.560 
O6255 
0.550 
0.545 
0.543 
0.542 
0.537 
0.534 
0.232 
0.927 
0.526 
G.225 
e222 
Qaa2o 
Srey | 
0.520 
0.517 
0.513 
0.509 
0.504 
0.495 
0.491 
0.488 
0.485 
0.480 
0.476 
0.473 
0.469 
0.467 
0.461 
0.460 
0.459 
0.456 
0.450 
0.449 
0.447 
0.442 
0.440 
0.438 
0.433 
0.432 
0.427 
0.421 
0.418 
0.414 
0.412 
0.410 
0.406 
0.404 
0.403 
0.402 
0.402 


PRESE 
COMPO 
THRES 
SERIE 
ASPEC 
SOFTW 
IDENT 
INDUS 
SYNCH 
CONCE 
CONFE 
FIELD 
DEPAR 
RECTA 
KONVE 
PL1 
DEVIA 
LAW 
REFER 
COMME 
TEACH 
MONIT 
REMOT 
PACKA 
LOCAL 
QUEUE 
PERIO 
PUBLI 
HEURI 
LOOP 
SOLVI 
SECON 
DIVES 
INR LE 
PROFE 
INVAR 
INTRO 
LINKS 
QUALI 
FLOWC 
CHART 
EXPAN 
REALI 
360 
pes Ua 
REGIO 
TITLE 
RANK 
DOMAT 
REVIE 
EXTEN 
Cost 
INQUI 
NONPA 
BOARD 
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0.568 
0.566 
0.562 
0.559 
0.552 
0.548 
0.545 
0.543 
0.542 
0.537 
0.534 
0929 
0.526 
0.526 
0.525 
0.525 
0.525 
0.520 
0.520 
0.517 
0.511 
0.508 
0.504 
0.494 
0.490 
0.488 
0.484 
0.480 
0.475 
0.472 
0.468 
0.466 
0.460 
0.459 
0.458 
0.454 
0.449 
0.448 
0.444 
0.441 
0.440 
0.438 
0.433 
0.430 
0.427 
0.420 
0.416 
0.412 
0.412 
0.409 
0.404 
0.404 
0.403 
0.402 
0.402 
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era 
NO2RS 
Hoban 


MAJOR 
SUM 

FIXED 
EXTRE 
SYMBO 
ORTHO 
LENGT 
SLEEPS 
ROBIN 
AGENT 
SYNAP 
JACKS 
TELEC 
MERCU 
HOBBS 
ANOMA 
AERON 
BINDI 
COLOU 
GIER 
SPATI 
TRACK 
SCHOL 
whook 
PANTA 
RENAM 
AUDIO 
2314 

ROSEN 
SrEUrr 
FERRI 
LENSE 
BROWN 
RESIS 
MAGNU 
SUBSE 
WILSO 
KNUTH 
ILL 

LABS 

PROMO 
LOAD 
ISSUE 
MASTE 
NONSI 
PRICI 
DECID 
PROLE 
METAB 
POLLU 
DATAN 
FOLIa 
LAY OU 
OPTIO 
FABRI 


0.402 
0.402 
0.402 
0.401 
0.398 
02391 
0.388 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 


ACADE 
SORT 

WATER 
PHRAS 
SLIBI 
CHOIC 
BACKW 
EXCLU 
TANK 
PARIT 
CHOMS 
POSTA 
ATTEN 
NONDE 
UNSTE 
GAAS 

RETAIL 
ONTAR 
SLEPP 
LOCI 
LIBER 
NORMS 
RIGID 
PERT 
GIRO 

ORDNU 
CALL 
HOHER 
BAIRS 
MERGE 
NET 

FLOWS 
EXCHA 
MODAL 
FOLYP 
RC 400 
HETER 
SUBOP 
REPLA 
LATEN 
SIMON 
BORDE 
SLT 

SHEFF 
FIRM 

HOT 

TROUB 
TREAT 
GAIN 

SHOP 
ORANI 
TRIAN 
UNSUP 
ROYAL 
SIZE 


0.402 
0.402 
0.402 
0.401 
0.394 
04291 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 


JAPAN 
DISTO 
EXECU 
SCALE 
REMAR 
ELIMI 
SHOCK 
BELL 
TRI 
ALLIE 
MONOT 
BOND 
SIDE 
POSTE 
CARTO 
APL 
STAT1 
ACOUT 
PLAN 
SEMIG 
TOWER 
CHICA 
BELIC 
ERRAT 
VISIO 
VOYSE 
SARDI 
ANTIC 
NONCO 
UNCON 
TANDE 
SLOW 
DREDG 
UNVOTL 
BASSA 
AMEND 
ACMCP 
IONES 
DEFLE 
REED 
MERSE 
OHIO 
POLES 
MOD 
ESSAY 
CREDI 
PEACE 
NICHO 
PANEL 
RACHF 
FRANK 
MAPS 
DONNE 
EDELM 
INACC 


0.402 
0.402 
0.402 
0.400 
0.393 
Os39t 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 


SERVO 
TUTOR 
ANNOU 
SUBRE 
MEETI 
PRIOR 
DOUGL 
LOCUS 
JORDA 
DORN 
METNO 
MARKU 
MARK 
DIGES 
LAGS 
DUPLE 
ELLIO 
DUALI 
DECAD 
pen F 
MERGI 
PREVI 
LANCZ 
1/0 
DISAG 
FIDEL 
META 
MINER 
IMPRE 
BIRTH 
EXPOR 
SUBMI 
ECMA 
RECEP 
STAR 
APPRA 
BEREC 
IMAGE 
MICHI 
RISK 
NORM 
BANKI 
DECOC 
SIDES 
BITS 
PURSU 
OVERD 
BETA 
HIDDE 
CANAD 
GROSS 
REWRI 
BIVAR 
LAGRA 
PREVE 
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0.402 
0.402 
0.402 
0.399 
e393 
0.390 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
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ORBIT 
LONDO 
PAGE 
ESTAB 
PROCR 
HASH 
DERDEL 
NETHE 
DIODE 
FLOWR 
DEFER 
MORPH 
DEEP 
HIGHE 
FEAST 
LANGE 
PIXAT 
IRRED 
EASTM 
MANUS 
ESSO 
DISS. 
HOLLA 
RAY 

INDIR 
DATAF 
eCy 

ENDED 
PEEKA 
Fat 

DAT AM 
FLOW 
SINGU 
SORTE 
ROTAT 
VIEWP 
EVIDE 
AREAS 
UNFOR 
RACHE 
BRAIN 
TREE 
POSSI 
EXPRE 
NUCLE 
POLE 


0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
6.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0,384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.381 
0.375 
0.373 
ies ho 
05373 
0.373 
OF373 
0.373 
0.373 
0.373 
Wy Ps 
0.362 
0.360 
0.349 
0.349 


TOTAL 


0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.380 
0.374 
0.373 
Oa374 
0.373 
0.373 
Vests 
Os 73 
Oshs 
Oso4a3 
Ores G5 
0.362 
O..35 
0.349 
0.347 


LAYMA 
PROBS 
MODES 
MARKS 
FLUX 
JONES 
MATEM 
DRUGS 
HAMMI 
FIGUR 
DEEFOS 
OCCUR 
JUMP 
NONRE 
LOSSE 
RCA 
HOUSTI 
COSMI 
FRANC 
KEIO 
CULTU 
PARTY 
PUFFT 
GIVEN 
DISCT 
FLORE 
MILNE 
REGEL 
KOSHE 
INTEL 
EMPHA 
WORKTI 
STAGE 
SUBCO 
RETEO 
SCALI 
VERTE 
WESCO 
DEMOD 
AUTO 
Disok 
FURTH 
TLAET 
EVENT 
DECEN 
INSTR 


0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.380 
One: 
0.373 
02273 
0.373 
0.373 
0.373 
0.373 
0.373 
0.372 
0.364 
0.361 
0.351 
0.349 
0.344 


SIGNE 
REGIM 
CDS 
INCOR 
ELEKT 
PRACN 
AUTON 
FCC 
SHELL 
BANDS 
APPRE 
PAREN 
PENTO 
MAEHL 
BLIND 
CLENS 
MOORE 
AREA 
CANCE 
SIZED 
FREED 
CBAC 
CEE 
DEMON 
DAVIE 
IONIN 
JENKI 
GLYCO 
HADAM 
HARD 
HIRSC 
MODIF 
CAVIT 
RESER 
ASYMM 
TOLER 
MOVIN 
CONFO 
IMPED 
BLEND 
THESA 
ENCOD 
KUTTA 
REGAR 
FILLI 
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0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0.384 
0,384 
0.384 
OR Oe) 
0.373 
OP i) 
0.373 
0.373 
0.373 
0.373 
0.373 
0.373 
0.372 
0.364 
0.360 
0.349 
0.349 
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APPENDIX E 


(I) Multi-parameter Search Example One 


QUE 60° F760 


MULTI~PARAMETER SEARCH EXAMPLE ONE 


AND T MANAGEMENT 300 
OR T INFORMATION 100 
OR T SYSTEMS 100 

AND T BUSINESS 300 
OR T INFORMATION 100 
Une Tt -SYSTEMS 100 

END 


The value of T' and T are respectively 0.466 and 0.733. 
After ncermalization, the first request vector becomes 
(management, information, systems) which has the corresponding 
set of weights (0.905, 0.302, 0.302). The set of search output 


for parameter one is given as follows: 


Relevance Document 

Value 

0.655 AMDOA70210204MATHE/USING TIP SYsts BAsots FILE | MANAG 
EXERC 

0.598 AMDOAT7T0210209HIGGE/MAYS /SALUT ASIS MANAG SYSTE EXERC 


USING PL1 USING GENER PURPO SYSTE 
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0.505 AMDOA69200111HELMK/MANAG COST ACCOU TECHN INFOR CENTE 
0.502 AMDOA70210163COCKR/SMITH/DOUGL/BENDE/APPLI MANAG COST 
ACCOU SCIEN INFOR 
0.477 AMDOA702102140LLE /GAGNO/SOLUT ASIS FILE MANAG EXERC 

USING KCA UL1 


0.472 AMDOA70210219BLOOM/APPLI CAPRI ASIS FILE MANAG EXERC 


Since the highest relevance value of this set of documents 
is 0.655 which is in the interval [0.466, 0.733], iterative 
search is required. The new query vector is (management, 
information, systems, using, tip, assistance, file, exercise) 
which has the corresponding set of weights (0.597, 0.302, 0.802, 
0.0, 0.0, 0.0, 0.576, 0.0). Note that the new request vector for 
parameter one now includes FILE as a significant term. The value 


of 2s 055042 By using a cutoff value of 05550, the final set 


OF 


of search output for parameter one is given as follows: 


Relevance Document 

Value 

0.963 AMDOATO2Z10204MATHE/USING TIP SYSTE ASSIS FILE MANAG 
EXERC 

Ue. G52 AMDOA70210209HIGGE/MAYS /SALUT ASIS MANAG SYSTE EXERC 


USING PL1 USING GENER PURPO SYSTE 


0.699 AMDOA66170026PARKE/USERS PLACE INFOR SYSTE 
0.670 AMDOA7O0210040HOLBR/THREA FILE RETRI SYSTE 
0.631 AMDOA70210274BURCH/ROLE FEDER GOVER INFOR SYSTE EDUCA 
0.572 AMDOA69200279SWANS/USER ORIEN INFOR SYSTE 
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0.565 AMDOA68190181WALL /POSSI ARTIC INFOR SYSTE NETWO 
0.550 AMDOA68190221JORDA/FRAME COMPA SDI SYSTE 
0.550 AMDOA70210160RICHM/COMPA SYSTE LABOR 


After normalization, the second request vector becomes 
(business, information, systems) which has the corresponding set 
of weights (0.905, 0.302, 0.302). The set of search output for 


parameter two is given as follows: 


Relevance Document 
Value 
0.674 AMDOA68190265CUETO/USING INFOR LIFE INSUR BUSIN WORLD 


Since the only relevance value is 0.674 which is in the 
interval [0.466, 0.733], iterative search is required. The new 
guery vector is (business, information, systems, using, life, 
insurance, world) which has the corresponding set of weights 
(GeS57 NOs O06, Us 302, 0607) 0.0 7106525, 0.0) Note that sethe si iey 
request vector fcr parameter two now includes INSURANCE as a 
significant term. The value of oT is5-0.565.._ By usings fa cutore 
value of 0.550, the final set of search output for parameter two 


is given as follows: 


Relevance Document 
Value 
0.965 AMDOA68190265CUETO/USING INFOR LIFE INSUR BUSIN WORLD 


0.696 AMDOA68190286MILLE/PSYCH INFOR 


i : - on 39: 


qvesch . 2SUEv, deenpes Bdoves nad qnoEtesil 
jee (ATOUNNS WT IO> Of) 268 fev laneapt ed a 
oO? tortvo Hotson Ty Jom wit OE 4808.0 20.0) 

sawol lot ga sowie eto 


itiae | x) -_ vv 


wrue-abeda AUZSE ¢411 SORT vw xe oitga3e9 6088 


-» iw! ef UWpeve Ue. af aebdtov ocuiieadeas eho 
ea ad? \Gexkuper al dovase ovivssedt ERT 0. 
yet.ii ,palewm ~praedive ..aoteeanoins ya8ecizsud) aa 
rioiew to aes pathaodeeston ade ant donde (orion 

a sit fodd o70K oe f,0 gases Ont peed) ime 
6 en 8900 Wet bu loat vod Ov) Tedemnong 
tes « pekeu ya .d3%.0 eb wo to entey 
ows 2etoeetep 20% Sd é2anes 26 9 Lome 


0.696 
0.659 
0.639 
0.622 


0.604 


0.600 


0.597 


0.596 
0.584 
0.563 
09558 
0.556 


0.556 


AMDOA702100890TTEN/DEBCN/METAS 
AMDOA66170026PARKE/VUSERS PLACE 
AMDOA70210004HUMPH/INFOR PEACE 
AMDOA70210274BURCH/ROLE FEDER 
AMDOA68190375COTTR/EVALU COMPR 

SAFET INFOR CENTE 
AMDOA65160291GARVIVINFOR SURVE 
AMDOA67180235BUCHA/HUTTO/ANALY 

NUCLE SAFET INFOR 
AMDOA681902000CONN/QUEST CONCE 
AMDOA70210095BROMB/ECONO INFOR 
AMDOA69200279SWANS/USER ORIEN 
AMDOA69200039LUNIN/ACADE INFOR 
AMDOA68190181WALL /POSSI ARTIC 


AMDOA68190305THOMP/ORGAN INFOR 


INFOR 


INFOR 


GCVER 


SCLEN 


MODER 


AUTOM 


INFOR 


INFOR 


CENTE 


INFOR 
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SYSTE 


INFOR SYSTE EDUCA 


TECHN INFOR NUCLE 
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(II) Multimparameter Search Example Two 


eS eS 


QUE aoe 79 
MULTI-PARAMETER SEARCH EXAMPLE TWO 


AND T COMPUTER 300 
OR T SIMULATION 500 
OR T SMALL 100 
OR T INFORMATION 300 
OR T SYSTEM 200 

AND T COMFUTER 300 
OR T SIMULATION 500 
OR T SMALL 100 
OR T INFORMATION 300 
OR T NETWORK 200 

END 


After normalization, the first request vector becomes 
(computer, simulation, small, information, system) which has the 
corresponding set of weights (0.433, 0.722, 0.144, 0.433, 
0.289). The value of T' and T are respectively 0.466 and 0.733. 


The set of search output for parameter one is given as follows: 


Relevance Document 

Value 

0.907 AMDOA68190120CARAS/COMPU SIMUL SMALL INFOR SYSTE 

ao 25 AMDOA681903E€3BAKER/NANCE/USE SIMUL STUDY INFOR STORA 


RETRI SYSTE 
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e552 AMDOA70210285JERMA/PROMI DEVEL COMPU ASSIS INFOR 
Ono 4 AMDOA64150142BOURN/FORD /COST ANALY SIMUL PROCE EVALU 


LARGE INFOR SYSTE 


0.504 AMDOA66170026PARKE/USERS PLACE INFOR SYSTE 
0.489 AMDOA68190278MCCON/COMPU GRAPH ASSEM LINE INFOR 
0.476 AMDOA70210274BURCH/ROLE FEDER GCVER INFOR SYSTE EDUCA 


After normalization, the second request vector becomes 
(computer, simulation, small, information, network) which has 
the corresponding set of weights (0.433, 0.722, 0.144, 0.433, 


0.289). The set of search output for parameter two is given as 


follows: 

Relevance Document 

Value 

Goi 58 AMDOA6819017120CARAS/COMPU SIMUL SMALL INFOR SYSTE 

(e352 AMDOATOZ102E5JERMA/PROMI DEVEL COMPU ASSIS INFOR 

0.489 AMDOA681S90278MCCON/CCMPU GRAPH ASSEM LINE INFOR 

0.479 AMDOA68190363BAKER/NANCE/USE SIMUL STUDY INFOR STORA 


RETRI SYSTE 


Since the highest relevance values of the two sets of 
documents are greater than 0.733, this search request does not 
require iterative searches. The final set of search output to be 
presented to the user in both examples will be the set of 


documents common to the two sets of output in respond to the two 


parameters. 
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